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DOMAIN ISOLATION THROUGH VIRTUAL 
NETWORK MACHINES 

BACKGROUND OF THE INVENTION 5 

1. Field of the Invention 

The present invention relates in general to communica- 
tions networks, and more particularly, to the operation of 
network devices that can operate in multiple virtual net- 10 
works simultaneously. 

2. Description of the Related Art 

Network Layering and Protocols 

A communication network provides information 15 
resources transfer services that transfer information 
resources among devices attached to the network. Informa- 
tion resources, as the term is used herein, includes any form 
of information that can be transmitted over a network for use 
by or with any end station or network device connected to 20 
the network. Information resources, for example, may 
include computer programs, program files, web pages, data, 
database information, objects, data structures, program 
icons, graphics video information or audio information. 
Computer Networks and Internets, Douglas E. Comer, Pren- 25 
tice Hall, 1997, provides extensive information about com- 
munication networks. 

Networks are built from devices or stations called nodes, 
and the communications channels that interconnect the 3Q 
nodes, called links. A set of nodes and links under one 
administrative authority is called a network domain. Com- 
munication between end stations attached to a network 
ordinarily is achieved through the use of a set of layered 
protocols. These protocols are generally described by refer- 35 
ence to the Open Systems Interconnection (OSI) computer 
communications architecture. The standard OSI architecture 
includes seven layers: application, presentation, session, 
transport, network, data link and physical. A communication 
network may employ fewer than the full seven layers. ^ 
However, the layer 2 and the layer 3 software protocols 
ordinarily play a prominent role in the transfer of informa- 
tion between interconnected networks and between end 
stations connected to the networks. 

The physical layer is the lowest layer (layer 1) of the OSI 45 
model. There are numerous technologies that can be 
employed to build networks at layer 2. Layer 2 networks can 
be "connection oriented", meaning that a connection must 
be established before data can flow between two si alio ns; 
ATM, Frame Relay, and X.25 are examples of connection so 
oriented layer 2 protocols. Layer 2 networks can also be 
connection-less, meaning data can be transmitted without 
establishing any connection in advance; Ethernet and FDDI 
are two examples of connection- less layer 2 protocols. 

In order to provide services useful to end users, the 55 
devices in a network must perform higher layer functions to 
create what are called "virtual networks". The "Internet" is 
one example of a very popular and public virtual network. 
The Internet uses the IP protocol to provide the higher layer 
(layer 3) functions required to support operation of the 60 
virtual network. There are many other private (virtual) 
networks that also uses the IP protocol. The term "internet" 
with a small "i" is used to differentiate between these less 
well known private internets, and the very popular and 
public large "I" Internet. There are many other protocols that 65 
can be used to construct virtual networks at layer 3, includ- 
ing IPX, DECnet, AppleTalk, CLNP, etc. There are many 
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other private and public networks using these other layer 3 
protocols, either independent of or in conjunction with the IP 
protocol. 

Thus, networks can be built at many different layers. Each 
layer has its own function and its own type of nodes and 
links. Higher layer networks are built "on top of* lower layer 
networks. In other words, nodes at a given layer may use the 
services of the next lower layer to provide links for com- 
munication with peer nodes (i.e. nodes at the same layer on 
other devices). Routers are examples of nodes in a layer 3 
network. Bridges are examples of nodes in layer 2 networks. 

Network Domains 

A network domain as the term is used herein refers to the 
set of nodes and links that are subject to the same admin- 
istrative authority. A single administrative authority may 
administer several networks in separate domains, or several 
layers of the same network in a single domain, or any 
combination. There are actually several possible adminis- 
trative domains in any large virtual network. The boundaries 
of a network domain can be defined along the lines dividing 
layers of the protocol stacks. For instance, the same layer I 
physical devices and physical connections may have several 
layer 2 network domains layered onto them. These layer 2 
domains, in turn, may have one or more layer 3 domains 
layered on top of them, A network domain may even 
transcend the boundaries between layers such that a layer 2 
network and a layer 3 network may be part of the same 
network domain. 

The administration of even a single network domain can 
be quite complex. Virtual networks have administrative 
authorities associated with them to control their higher layer 
functions. The cost of administering a network, physical or 
virtual, can be enormous, and is often the largest cost item 
in the operations of * network. 

When several virtual networks are layered on lop of ihe 
same layer 2 service or another virtual network, the bound- 
aries between network domains may be somewhat obscure. 
The boundaries between the domains of the overlaid virtual 
networks intersect at points where they must share physical 
or virtual resources. In practice, the administrators of the 
overlaid virtual networks are very concerned about sharing 
resources, especially when they are competing commercial 
entities. Concerns arise about integrity, privacy, and security 
of data and network control information flowing across the 
shared resources at the lower layers. The administrators of 
the underlying networks arc called upon to solve complex 
administrative problems. The costs of administering these 
networks increases quickly with the number of virtual 
networks, their size, the complexity and compatibility of 
their individual policies, and increased demands for security, 
integrity, and isolation between domains. 

Network Devices and Databases 

The term network device is used here to refer to the 
collection of mechanisms (e.g. computer and communica- 
tions hardware and software) used to implement the func- 
tions of a station in a network. A network device contains 
some capacity to store and operate on information in data- 
bases in addition to the ability to transmit and receive 
information to and from other devices on the network. 
Examples of network devices include but are not limited to 
routers, bridges, switches, and devices that perform more 
than one of these functions (e.g. a device that does both 
routing and bridging). 

A router is an example of a network device that serves as 
an intermediate station. An intermediate station is a network 
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device that interconnects networks or subnetworks. Atypical 
router comprises a computer that attaches to two or more 
networks and that provides communication paths and rout- 
ing functions so that data can be exchanged between end 
stations attached to different networks. A router can route 
packets between networks that employ different layer 2 
protocols, such as Token Ring, Ethernet or FDDI, for 
example. Routers use layer 3 protocols to route information 
resources between interconnected networks. Nothing pre- 
cludes a network device that operates as an intermediate 
station from also operating as an end station. An IP router for 
example typically also operates as an end station. 

A router can understand layer 3 addressing information, 
and may implement one or more routing protocols to deter- 
mine the routes that information should take. A multiproto- 
col 10 router runs multiple layer 3 protocols such as IP, IPX 
or Apple Talk for example. A router also be characterized as 
being multiprotocol if it runs multiple adaptive routing 
protocols such as RIP, BGP or OSPF all feediing a single IP 
layer. 

The network device router configuration of FIG. 1A 
depicts what is often referred to in industry as a multi- 
protocol bridge/router. In this illustrative example, there are 
separate databases for three layer 2/3 networking protocols: 
bridging, IP routing, and IPX routing. The example IP 
database employs both the OSPF and RIP dynamic routing 
protocols. Thus, the intermediate station node of FIG. 1A 
includes both multiple networking protocols and multiple 
routing protocols. 

A bridge is another example of a network device that 
serves as an intermediate station. Atypical bridge comprises 
a computer used to interconnect two local area networks 
(LANs') that have similar layer 2 protocols. It acts as an 
address filter, picking up packets fiom one -LAN- .that, are 
intended for a destination on another LAN and passing those 
packets on. A bridge operates at layer 2 of the OSI archi- 
tecture. 

The term network database will be used to refer to all the 
control information housed in a network device required to 
support the device's operation in a set of one or more 
networks. Each device in a network holds its own network 
database. In order for the network at large to operate 
properly, the network databases of all network devices in a 
network domain should be consistent with each other. The 
network database control information defines the behavior 
of its network device. For example, not only might it 
determine whether the network device will function as a 
router or a bridge or a switch, but also it will determine the 
details of how the device will perform those functions. 

When a network device is deployed to operate in multiple 
domains, its network database can become quite complex. 
The cost of administering the network device increases 
significantly when the network database is more complex. 
The cost of administration is already the most significant 
cost of operating many networks, and the trend toward 
greater complexity through greater use of virtual networking 
continues unabated. 

The information found in a typical network database 
includes, but is not limited to, data used to configure, 
manage, and or monitor operations of: 

Communications Hardware (e.g. layer 1 transceivers/ 
drivers/chips etc.) 

Computer Hardware 

Computer Software 

Layer 2 Addressing 
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Layer 2 Connections (Layer 2 interfaces) 
Traffic filter policies 
Bridging (IEEE 802.ID) 
Bridge filters and or policies 
Network (layer 3) Addressing 
Layer 3 Connections (Layer 3 interfaces) 
(Network/layer 3) Address Translation (NAT) policies 
Access Control (e.g. user names and password) 
io Access policies (e.g. what user can use what services) 
Routing (IETF RFC 1812) 

Routing Protocols (e.g., BGP, OSPF, RIP, IGRP, etc.) 
Route filters and policies (e.g. route leaking) 

15 Tunneling 

Tunneling Protocols (e.g., L2TP, GRE, PPTP, etc.) 
A single network device can operate in one or more 
(virtual) network domains. For each domain in which a 
device operates, it needs to store information about that 

20 domain in some database form. 

Much of the information in a network database must be 
configured manually; particularly the policy information as 
it must reflect the administrator's subjective wishes for how 
the network should operate. Manual configuration involves 

25 human effort, which can become expensive, especially as the 
number of policies and their complexity increases. Network 
administrative chores include the assignment of user names, 
passwords, network addresses or other user identifiers, and 
configuration of policy databases. This configuration and 

30 management may be used to establish traffic filtering poli- 
cies such as what kind of information pay loads will be 
carried. Traffic and Route filtering policies may be estab- 
lished to determine what paths through the network will be 
used for each payload carried. Access control policies may 

35 be -to dicta re which users at which end stations have access 
to which services at other end stationsr Security policies may 
be established to ensure the integrity of the information 
payloads. Each configured bit of policy somehow finds its 
way into the network database of the device implementing 

40 the policy. 

Cisco Router Configuration by A. Leinwand, B. Pinsky 
and M. Culpepper, published by MacMillan Technical 
Publishing, Indianapolis, Ind., 1998 provides an extensive 
treatment of the configuration of the databases of Cisco 
45 System routers. This is just one example of a network device 
database. 

Building Virtual Networks 

The layering of software protocols in accordance with the 

50 ISO architecture makes possible the creation of "virtual 
networks". Virtual networks are to be contrasted with physi- 
cal networks. Two physical networks which have no physi- 
cal devices or links in common, can be said to be physically 
isolated from each other. Physical isolation may be required 

55 in order to ensure that a network has the highest levels of 
security and integrity. 

Physical networks are defined at layer 1 of the OSI model. 
Virtual networks, on the other hand, are created at higher 
layers. It is possible to create multiple virtual networks all 

60 sharing common physical resources. A network is definitely 
virtual if it shares a common physical medium or device, 
such as an intermediate station, with any other (virtual) 
network. There are many conventional technologies and 
many commercially available products which can be used to 

65 build many types of virtual networks. For example, virtual 
circuits are a layer 2 construct that can be employed to create 
virtual networks. 
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It has been common practice in the industry for phone 
companies to offer connection oriented layer 1 and 2 ser- 
vices to Internet Service Providers (ISPs), corporations, and 
residential customers. These customers may build one or 
more higher layer (layer 3 and above) virtual networks on 5 
top of such publicly available layer 1 and 2 services. The 
higher layer virtual networks share a common set of layer 1 
and 2 services, each having it's private set of virtual circuits. 

A PC or a server are examples of end stations. End 
stations located at home or business, for example, may 10 
connect into an internet through an internet service provider 
(ISP). There are regional, local and global ISPs. In most 
cases, local ISPs connect into the regional ISPs which in turn 
connect into other regional or national ISPs. FIG. IB illus- 
trates an example of a connections to an ISP. In the example, 15 
home user end stations may connect via modems over 
dial-up lines to an ISP's router or remote access server 
(RAS). This data link often runs the PPP (Point-to-Point 
Protocol) which encapsulates and delivers packets to the 
ISP's site. Business user end systems may connect to the ISP 20 
through leased lines such as Tl lines or T3 lines depending 
on bandwidth requirements for example. Other examples of 
typical connection options between home or business users 
and an ISP include ISDN, Tl, fractional Tl, various optical 
media, and xDSL. ISPs may also offer tunnel mode or 25 
transport mode services that help businesses set up virtual 
private networks (VPNs) between remote end stations and 
virtual dial-up services for remote and mobile end stations. 

The ISP serves as a conduit for information transmitted 
between the end stations in the home and other end stations 30 
connected to the Internet. 

A virtual circuit is a dedicated communication channel 
between two end stations on a packet-switched or cell-relay 
network. ATM, Frame Relay, and X. 25 are all- different types „ 
of virtual circuit based networking technologies. A virtual ^ 
circuit follows a path that is programmed through the 
intermediate stations in the network. 

There are permanent and switched virtual circuits. A 
permanent virtual circuit (PVC) is permanent in the sense ^ 
that it is survives computer reboots and power cycles. A 
PVC is established in advance, often with a predefined and 
guaranteed bandwidth. A switched virtual circuit (SVC) is 
"switched" in the sense that it can be created on demand 
analogous to a telephone call. Both PVCs and SVCs are 45 
"virtual" circuits in that they typically are not allocated then- 
own physical links (e.g. wires), but share them with other 
virtual circuits running across the same physical links. 

"Tunneling" is one mechanism for building higher layer 
networks on top of an underlying virtual network. Tunneling 50 
has already gained acceptance in the industry and several 
technologies are either in operation or under development. 
Some of the tunneling protocols used in IP networks for 
example include L2TP, GRE, PPTP, and L2F. There are 
many other Tunneling technologies used in IP and other 55 
protocols. 

Referring to FIGS. 2A-2B, there are shown network 
graphs representing two illustrative networks. Network A is 
represented by three nodes (NA1, NA2, and NA3), and three 
links (LAI, LA2, and LA3). Network B is represented by 60 
four nodes (NB1, NB2, NB3, and NB4) and four links (LB1, 
LB2, LB3, and LB4). As used herein, the term node may 
represent any end station or intermediate station, and the 
term link means any connection between nodes. If these are 
physical nodes and links, Networks A and B are physically 65 
isolated from each other. If these are virtual (circuit) links 
which actually depend on a shared physical medium, then 
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the two (virtual) networks are said to be virtually isolated 
from each other. 

Illustrative Networks A and B each may be part of 
different network domains. Independent administrative con- 
trol may be exercised over each of the Network A and B 
domains, for example, through the configuration and man- 
agement of intermediate stations such as bridges and routers. 

Referring to FIGS. 2Aand 2B, it will be appreciated thai 
the independent administration of the Network A and Net- 
work B domains may result in incompatible policies as 
between the two domains. This is not a problem provided 
that the domains remain isolated from each other. Referring 
to FIG. 3, however, there is shown a network graph of 
Network C which comprises Networks A and B joined by 
link LJ. The isolation between Networks A and B, whether 
physical or virtual, is lost when they are joined in Network 
C. This joining of the two Networks A and B may create 
challenges to the administration of combined Network C. 
For example, despite the joining of the two networks, there 
still may be a need to apply different or even conflicting 
policies to each of Networks A and B. In essence, the 
administrative challenge is to maintain the administrative 
integrity of the Network A domain and the administrative 
integrity of the Network B domain despite the fact that both 
of these networks are part of Network C and are no longer 
physically isolated from each other. 

FIG. 4 is an illustrative drawing of a segment of a single 
physical medium capable of carrying multiple information 
flows, each in its own virtual circuit (or channel). The 
physical medium may for instance be a cable or a wire or an 
optical fiber. The segment shown is carrying four indepen- 
dent information flows on four different virtual circuits; 
VC1, VC2, VC3, and VC4. These virtual circuits, for 
example, may be implemented using X.25, ATM, Frame 
Relay, or some other virtual circuit (or channelized) service. 

FIG. 5 is an illustrative drawing representing in example 
of two virtual networks (VN1, and VN2) each made up of 
two independent network segments (VN1.1 and VN1.2 for 
VN1, and VN2.1 and VN2.2 for VN2). All segments connect 
to shared physical network resources. In this example, the 
shared network resources of FIG. 5 provide a virtual circuit 
service. A virtual circuit connection to an end station or 
intermediate station connection to a virtual circuit is called 
a virtual channel connection (VCQ. VN1 connects at VCC1 
and VCC4; and VN2 connects at VCC2 and VCC3. The 
shared network resources also provide virtual circuit service 
that connect VCC1 and VCC4 so as to join VN1.1 and 
VN1.2 into VN1 and so as to join VN2.1 and VN2.2 into 
VN2. 

FIG. 6 is an illustrative drawing that provides additional 
details of some of the physical constituents of the virtual 
networks of FIG. 5. An intermediate station labeled 
VN1.1.VCC1 in VN1 connects segment VN1.1 to the VC 
service at VCC1. An intermediate station labeled 
VN1.2.VCC4 in VN1 connects segment VN2 to the VC 
service at VCC4. The VC service connects VCC1 to VCC4, 
linking VN1.1 to VN1.2 at the virtual circuit level. More 
specifically, physical media segments PM2, PM1 and PM5 
and intermediate stations IS- A and IS-B provide the requi- 
site physical infrastructure upon which the virtual circuit 
connection linking VN1.1 and VN1.2 is carried. This first 
virtual circuit connection serves as a network link between 
the VN1.1.VCC1 and VN12.VCC4 intermediate stations, to 
create one virtual network from the two segments VN1.1 
and VN1.2. 

Similarly, VCC2 and VCC3 are connected by the virtual 
circuit service, which connects intermediate stations 
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VN2.1.VCC2 and VN2.2.VCC3, joining the VN2.1 and 
VN2.2 segments to form the virtual network labeled VN2. 
More particularly, physical media segments PM4, PM1 and 
PM3 and intermediate stations IS-A and IS-B provide the 
virtual connection linking VN2.1 and VN2.2. The second 
virtual circuit connection serves as a network link between 
the VN2.1.VCC2 and VN2.2.VCC3 intermediate stations, to 
create one virtual network from the two segments VN2.1 
and VN2.2. 

FIG. 7 is an illustrative drawing shows the logical or 
higher level view of the two virtual networks VNl and VN2 
of FIGS. 5 and 6. It will be appreciated from the view of 
FIG. 6 that they share physical resources, and it will be 
appreciated from the view of FIG. 7 that they are logically 
or virtually separate. 

In the illustrative example of FIG. 8, two virtual networks 
are layered oq top of a third virtual network. The sharing of 
a common set of physical or virtual network resources by 
several virtual networks increases the challenges of main- 
taining isolation and security of the individual virtual net- 
works. Nevertheless, end user requirements for information 
resources, technology advances, economics, politics, and 
regulations surrounding the networking industry are driving 
commercial, private and government entities to share com- 
mon physical and virtual network infrastructure. Therefore, 
there arc ever increasing demands imposed upon network 
administrators, and vendors of networking equipment. 

In the illustrative drawing of FIG. 8, three separate 
network domains intersect at node INI: i) that of the Internet 
itself (including or subsuming that of the underlying VC 
service supporting the Internet); ii) that of private virtual 
network VNl; and iii) that of private virtutal network VN2. 
This intersection of three network domains creates the 
potential for the kinds of administraiiuii asd policy- chal- 
lenges discussed above. It will be noted that these networks 
are represented by different network "clouds" that symbolize 
the multifarious nodes and links in each of the networks. 

The illustrative drawing of FIG. 8 illustrates an example 
of building two virtual networks on top of another virtual 
network similar to the previous example in FIGS. 5, 6 and 
7. As before, the virtual networks being overlaid are each 
composed of two segments. Using a tunneling protocol or 
some other higher layer (layer 3 or above) mechanism, 
connections are made between nodes IN1.1 and IN1.2 to 
form a link to tie the two segments of VNl together. This 
link is shown as Tl in FIG. 9 and 10. Link T2 is similar, 
formed between nodes IN2.1 and IN2.2, to tie the two 
segments of VN2 together. The logical view of the two 
virtual networks in FIG. 9 is shown in FIG. 10, which bears 
a very strong resemblance to FIG. 7. The important differ- 
ence to note between the examples is that in FIG. 7 a layer 
2 VC network was used as the underlying network shared 
resources, and in FIG. 10 another virtual network was used 
as the underlying network shared resources; specifically, a 
tunneled service across the Internet. Thus, it will be appre- 
ciated that different virtual networks can be formed in 
different layers using the same underlying physical (or 
virtual) network resources. 

Connections are established between nodes at the edge of 
the segments where they interface or connect to the shared 
(Internet) resources which are analogous to the virtual 
circuits in FIGS. 5, 6, and 7. These may be tunneled 
connections, or connections built using some other 
(connection-less) technology. 

If we assume Tl and T2 arc tunnels, the network data- 
bases of IN1.1, IN1.2, IN2.1, and IN2.2 would be aug- 
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mented with data structures to manage the tunneling proto- 
col at those endpoints, and the links made up from the 
tunnels. The network database of IN1.1 of FIG. 8 is depicted 
in FIG. U which highlights the "Tunneling Database" and 
5 the "IP Database". 

Network Database Organization 

If we examine the information in the network database for 
INI, we will see that it should include configuration and 

10 policy information for three separate domains. Furthermore, 
since the information from the three domains must all 
coexist in the same physical device, there should be some 
way to structure the information and control its usage, such 
that the INI device operates correctly in all three domains. 

15 If all information for the device INI were stored in one 
monolithic from as is done conventionally, in addition to all 
the policies for each domain, inter-domain policies would 
also be required to ensure that information should be is kept 
private to its own domain. 

20 The illustrative drawing of FIG. 12 is a generalized 
drawing of a conventional monolithic structure for a data- 
base that can be used to implement node INI of FIG. 7. The 
drawing depicts, in a conceptual fashion, an example of the 
typical orgauizalion of information within such a device. 

25 The illustrative device includes a first interface attached to 
VN1.1, a second interface attached to VN2.2 and a third 
interface attached to the Internet as the shared network 
resources. To illustrate the complexities in the database 
design, assume that both the virtual networks being overlaid 

30 on the Internet arc also (private) IP networks (internets). 
Therefore all three networks/domains operate using the IP 
protocol, each having its own independent IP information to 
be stored in INl's network database. 

35 The_database includes information such as rules used to 
articulate and imp lement^adminislra live policies. The . poli- 
cies as articulated in the information and rules, for example, 
may include security rules, restrictions on access and 
dynamic routing protocols. In this illustrative router, the 

4Q policy information and policy rules used to control the layer 
3 IP protocol routing for all three networks are included in 
a single monolithic database. 

However, as explained above, different network domains 
may have different or perhaps even conflicting policies. In 

45 order to provide at least some degree of isolation, additional 
and complicated "inter-domain" policy mechanisms must be 
added to manage the conflicts between policies on similar 
data from different domains. These mechanisms are config- 
ured and managed by an administrative authority. The dotted 

so lines in FIG. 12 represent the points at which these intcr- 
domain policy mechanisms would be introduced. The poli- 
cies would attempt to divide the monolithic network data- 
base of node INI into three separate domain-specific 
sections. These dotted lines indicate that separation policy 

55 mechanisms are implemented, to provide at least some 
isolation of the information pertaining to VNl from the 
information pertaining to VN2, and also from the informa- 
tion pertaining to the Internet (i.e. shared network 
resources). 

so It will be appreciated that the complexity and difficulty in 
defining and administering the policy mechanisms used to 
achieve isolation can be great. There is potential for a wide 
range of policies to be defined between domains. Everything 
in the spectrum from almost complete openness and sharing 

65 of all information between domains, to the other extreme of 
not sharing anything at all are possible. Certain pieces of a 
domain's database may want to be kept private (e.g. access 
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control policy configuration), while other parts are shared to 
some extent (e.g. summarized routing and addressing 
information). The types of data, and the extent to which they 
can all be shared, are all subject to restriction through 
definition of inter-domain policies. 5 

If we consider each boundary between a pair of domains 
(i.e. each dotted line through the network database of INI in 
FJG. 12) as a separate policy object, it will also be appre- 
ciated that the number of policy objects increases much 
faster than the number of domains. If D is the number of 10 
domains, then P, the number of policy objects can be 
calculated approximately as: 

p-(p{D-i))n 

Thus, the number of policy objects increases approximately 15 
as (a proportion of) the square of the number of domains. In 
other words, the number of policy objects ordinarily 
increases much faster than the number of domains, espe- 
cially as the number of domains gets large. 

Another challenge in the administration of virtual net- 20 
works arises because home or business end station users 
may wish to change the nature of their connections to the 
network from time to time. For instance, an end user may 
wish to utilize a more expensive higher bandwidth connec- 
tion for business use and a less expensive lower bandwidth 25 
connection for home or personal use. Alternatively, for 
instance, an end user may wish opt to receive a video 
transmission on a higher bandwidth connection while still 
receiving other transmissions on lower bandwidth connec- 
tions. An end user may even wish to change the ISP that he 30 
or she uses. Unfortunately, these changes often require 
intervention by a network administration authority to change 
the higher level binding between the end user station and the 
network. More specifically, the biuuing (or issoriation) 
between the layer 2/1 virtual circuit service and a layer 3 35 
intermediate device is 'hard', not dynamic, and the higher 
layer interface generally must be reconfigured by a network 
administrator to change the binding. 

Thus, there has been a need for improved organization of 
network domain databases and improvements in the ability 40 
of a network user to change network domain. The present 
invention meets these needs. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1A is a generalized diagram of a multi-protocol 45 
bridge/router. 

FIG. IB is an illustrative example of the topology of and 
connections. 

FIGS. 2A and 2B are network graphs of two illustrative 
example networks. 50 

FIG. 3 is a network graph of an illustrative network in 
which the networks of FIGS. 2A and 2B are joined. 

FIG. 4 is an iUustrative drawings of a segment of a single 
physical medium capable of carrying multiple information ss 
flows which in its own virtual circuit (or channel). 

FIG. 5 is an illustrative drawings of two virtual network 
each made up of two independent segments. 

FIG. 6 is an illustrative drawings that provides additional 
details of some of the physical constituents of the virtual gg 
networks of FIG. 5. 

FIG. 7 is an illustrative drawings which shows the logical 
or higher level view of the two virtual network VN1 and 
VN2 of FIGS. 5 and 6. 

FIG. 8 is an illustrative drawings lhat shows that the 65 
Internet can provide the shared network resources of FIGS. 
5 and 6. 
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FIG. 9 is an illustrative drawings that shows tunneling 
through the Internet to provide the shared resources of FIGS. 
5 and 6. 

FIG. 10 is a logical or high level view of the two virtual 
networks of FIG. 9. 

FIG. 11 is a generalized illustrative drawing of the orga- 
nization of node INI to achieve tunneling. 

FIG. 12 is a conceptual drawing of one possible router 
configuration that can be used to implement intermediate 
node INI of FIG. 7. 

FIG. 13 is a generalized block diagram of a network 
device that instantiates multiple virtual network machine 
routers in electronic in accordance with one embodiment of 
the invention. 

FIG. 14 is a generalized block diagram of a network 
device that instantiates a virtual network machine with 
multiple layer 2 sub-interface data structures and multiple 
layer 3 interfaces and binding data structures that associate 
layer 2 sub-interface data structures and layer 3 interfaces. 

FIG. IS is a generalized block diagram of the network 
device of FIG. 14, except that one binding data structure has 
been removed and another binding data structure has been 
created. 

FIG. 16 is a generalized block diagram of a network 
device that implements a virtual network machine router and 
a virtual network machine bridge. 

FIG. 17 is a generalized block diagram of the network 
device as in FIG. 16, except that one binding data structure 
has been removed and another binding data structure has 
been created. 

FIG. 18 is a generalized block diagram of the network 
device of FIG. 14, except that one binding data structure has 
been eliminated and another binding data structure has been 
created. 

FIG. 19 is a generalized block diagram of a network 
device which comprises a computer which instantiates mul- 
tiple virtual machines in accordance with an embodiment of 
the invention. 

FIG. 20 is generalized block diagram of the network 
device of FIG. 19 except that one binding data structure has 
been removed and another binding data structure has been 
created. 

FIG. 21 is a generalized block diagram of a subscriber 
management system in accordance with a presently pre- 
ferred embodiment of the invention. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 

The present invention comprises a novel apparatus and 
method for managing operation of network devices that can 
operate in multiple virtual network domains. The following 
description is presented to enable any person skilled in the 
art to make and use the invention, and is provided in the 
context of particular applications and their requirements. 
Various modifications to the preferred embodiments will be 
readily apparent to those skilled in the art, and the generic 
principles defined herein may be applied to other embodi- 
ments and applications without departing from the spirit and 
scope of the invention. Thus, the present invention is not 
intended to be limited to the embodiments shown, but is to 
be accorded the widest scope consistent with the principles 
and features disclosed herein. 

Virtual Network Machines 

A Virtual Network Machine (VNM) as the term is used 
herein to describe the collection of processes and mecba- 
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nisms that operate on a network device to implement the 
functions of a node in a virtual network. The preferred 
embodiment for the VNM is as a set of computer programs 
and related data structures encoded in electronic memory of 
a network device and used to operate on information, 5 
consuming some portion of a network device's computer 
and memory storage capacity. The functionality of a virtual 
network machine can be that of a router, bridge or switch, 
depending on what is configured in its network database. 
The native resources of a network device include its 10 
processors), memory, I/O, communication hardware and 
system software. The native resources of a network device, 
for example, may include peripheral devices or even a server 
computer which may, for instance, provide information 
about end user privileges or virtual network configurations. 15 

Referring to the illustrative drawing of FIG. 13, there is 
shown a generalized block diagram of a new structure for the 
network database of node INI from FIGS. 8 and 12 in 
accordance with one-embodiment of the invention that sup- 
ports creation of multiple virtual network machines. In this 20 
case, the networks device INI supports three virtual network 
machines VNMO, VNM1 and VNM2. In the embodiment of 
FIG. 13, assuming again that all three virtual networks 
operate using the IP protocol, each virtual machine imple- 
ments the functionality of an IP router, each operating in its 25 
own network domain. Each virtual network machine is 
allocated a portion of the device's native resources. Each 
virtual network machine runs the IP protocol stack. Each 
virtual network machine stores its address, policy and con- 
trol information separately from the others. Thus, each 30 
virtual network machine can operate independently of the 
other virtual network machines, even though it shares native 
computer resources with the other virtual network machines. 
This virtual network machine based organization of infor- 
mation therefore provides greater isolaiiuii between network 35 
domains. 

Each virtual machine has its own network database that 
contains its control information. VNMO has a network 
database that causes it to operate as a router that routes 
information within the Internet network domain. VNM1 has 40 
a network database that causes it to operate as a router that 
routes resource information within network domain VN1. 
VNM2 has a network database that causes it to operate as a 
router that routes resource information within network 
domain VN2.1. High Speed Networks, TCP/IP and ATM 45 
Design Principles, by William Stallings, Prentice Hall, 1998 
provides detailed discussion of router functions and the 
functions of other network devices. 

The VNMs of FIG. 13 may employ multiple different 
kinds of layer 1 (physical) media to attach to one or more 50 
networks. In a presently preferred embodiment, these physi- 
cal connections include ATM OC-3c/STMl, ATM DS-3/E3, 
DS-3 Clear Channel, HSSI and 10/100 Base-2 T TX. 
Resource information is transmitted across these physical 
connections such as phone lines, DSL or ADSL for example 55 
to and from VNMO, VNM1 and VNM2 using layer 2 (data 
link) protocols. There are layer 2 LAN (local area network) 
technology and layer 2 WAN (wide area network) technol- 
ogy protocols. Examples of LAN technologies include Eth- 
ernet and IEEE 8023, Fast Ethernet, Token Ring and Fiber so 
Distributed Data Interface. Examples of WAN tecbnologies 
include Asynchronous Transfer Mode (ATM), Frame Relay, 
X.25. Point-to-Point (PPP), Integrated Services Digital Net- 
work (ISDN) and High-Level Data Link Control (HDLC). 
Intermediate stations communicate with each other using 65 
layer 3 protocols. Layer 3 protocols include Internet Proto- 
col (IP), AppleTalk and Inter Packet Exchange (IPX). Thus, 
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for example, VNMO, VNM1 and VNM2 each employ one or 
more layer 3 protocols to communicate with other stations of 
the network(s) to which they are attached. 

Thus, the three virtual machines and the different network 
domains associated with them are isolated from each other 
in the network device intermediate station of FIG. 13, and 
the task of exercising administrative control can be simpli- 
fied significantly. Since there is no monolithic database that 
must be maintained to control information transfers across 
all of the networks to which the three VNMs are attached, 
the task of administering each database is simplified. 

The virtual network machine based organization also 
simplifies the administration, lowering the cost of operating 
all three networks. The organization of information along 
network domain boundaries eliminates the notion of infor- 
mation from two domains residing under a single monolithic 
structure, and thereby eliminates the need to define inter- 
domain policies to manage the separation of information 
within a monolithic database structure. The separation 
policy mechanisms represented by the dotted lines cutting 
through the database of FIG. 12 are gone, and a whole set of 
administrative chores disappears with them. There will be no 
need to define the complicated inter-domain policies, and no 
cost associated with administering them. The amount of 
information that needs to be configured by the administra- 
tors is greatly reduced in size and complexity using this 
method of database organization. 

Other benefits can be realized through greater efficiencies 
in the implementation of such network devices that are 
possible with this method of network database organization. 
Further efficiencies are realized through the elimination of 
the complicated inter-domain policies in virtually all func- 
tions of the device. Essentially, each of the virtual machines 
VNMO, VNM1 and VNM2 operates a separate/independent 
network device, perfuiuiing networking functions its own 
domain. 

Dynamic Binding 

The drawing of FIG. 14 shows another illustrative 
embodiment of the invention. The IP network device of FIG. 
14 implements a router that includes three network inter- 
faces NIF3-0, NIF3-1 and NIF3-2. The network device also 
has a layer 1/2 connection to an Ethernet service. The 
network device also has a layer 1/2 connection to a virtual 
circuit service. An Ethernet service sub-interface data struc- 
ture Ethl provides the layer 2 Ethernet connection such as 
sub-interface data structure provides the layer 2 VCC1 
connection. For example, the VCC1 sub-interface data struc- 
ture of FIG. 14 may be kept in a tabic that identifies all 
virtual circuit connections, each defining the encapsulation 
protocol, the packet or cell, data compression technique and 
the particular layer 2 protocol used on that circuit. The 
Ethernet sub-interface data structure may include the Eth- 
ernet address of the local connection and other parameters to 
control transmit and receipt of information on the Ethernet 
segment. A binding data structure B3-0 binds the Ethernet 
sub-interface data structure to NIF3-0. A binding data struc- 
ture B3-2 binds the VCC1 sub-interface data structure to 
N1F3-2. The Ethernet and VCC1 sub-interface data struc- 
tures are labeled with the prefix "sub" because they are layer 
2 constructs which are below the layer 3 interface constructs 
in the ISO scheme. 

Referring to FIG. 14, binding data structure B3-0 estab- 
lishes a layer 2/3 connection between the Ethernet sub- 
intcrfacc data structure and NIF3-0, and binding data struc- 
ture B3-2 establishes a layer 2/3 connection between VCC1 
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sub-interface data structure and IF3-2. Binding data struc- unbound. This change in bindings causes both the Ethernet 

ture B3-0 causes information transferred across the Ethernet and the virtual circuit lower layer services to be associated 

connection to be processed through to NIF3-0. An IP with a single higher layer IP construct, NIF3-2. 

Forwarding/Routing database controls routing of the infor- piG. 19 shows a network device which comprises a 

[nation out the correct interface. Binding data structure B3-2 5 computer which instantiates multiple virtual network 

causes the information transferred across the VCC1 conncc- machines VNM5 and VNM6. VNM5 implements IP router 

tion to be processed through NIF3-2. functionality. It includes network interfaces NIF5-0 and 

The VCC1 sub-interface data structure instantiates a NIF5-1. VNM6 also implements IP router functionality. It 

virtual circuit connection to the network device of FIG. 14. includes two interfaces N1F6-0 and NIF6-1. The network 

A virtual circuit connection such as that in FIG. 14 can be 10 device of FIG. 19 has two layer 1/2 connections to a virtual 

created in accord with any of several technologies. A sub- circuit service. Sub -interface data structure VCC1 instanti- 

interface data structure like that in FIG. 14 stores the ates one of the connections to the device. Sub-interface 

network device's identity of the virtual circuit attached to it. VCC2 instantiates the other connection to the device. A 

Many virtual circuits can be established across a single binding data structure B5-0 binds the VCC1 sub-interface 

physical connection, and many virtual circuits can be con- 15 data structure to NIF5-0 of VNM5. A binding data structure 

nected to a single network device. B6-2 binds the VCC2 sub-interface data structure to inter- 

FIG. IS depicts the same intermediate station as in FIG. face NIF6-1 of VNM6. VNM5 and VNM6 each use the IP 

14, except the binding B3-0 has been eliminated, and protocol suite to communicate with other stations of the 

binding B3-1 has been created. Binding B3-1 associates the networks) to which they are attached. 

Ethernet sub-interface data structure Eth-1 with interface 20 FIG. 20 depicts the same network device as in FIG. 19, 

NTF3-1. Interface NIF3-2 remains bound to the sub-interface except the binding B5-0 has been eliminated and binding 

data structure VCC1. The interface NIF3-0 is not bound to B6-0 has been created. The binding B6-0 data structure 

any layer 2 construct. It should be noted that an unbound associates VCC1 sub-interface data structure with K1F6-0 of 

interface construct generally would represent a mis- VNM6. Binding data structure B6-1 binds sub-interface data 

configuration in a typical earlier intermediate station. 25 structure VCC2 to NIF6-1. Neither of the VNM5 interfaces 

FIG. 16 depicts yet another illustrative embodiment of the NIF5-0 and NIF5-1 arc bound, 

invention. The network device of FIG. 16 implements an IP In FIGS. 14 to 20, bindings are shown as data structures 

router function and a bridging function. The router includes connected to other data structures by line segments. In one 

two interfaces N1F4-1 and NIF4-2. The bridge includes a ^ preferred embodiment, the line segments each represent a 

bridge interface BR4-0. A network database that implements pair of bi-directional pointers; the first pointer points from 

the bridge function includes a list of network stations the binding to the higher or lower layer data structures and 

reachable through each of the bridge's interfaces. The net- the second is opposite the first, pointing from the higher or 

work device also has a layer 1/2 connection to an Ethernet lower layer data structure to the binding data structure, 

service. The network device also has a layer 1/2 connection 35 Alternatively, the binding could be implemented as indices 

to a virtual circuit service VCC1. An Ethernet service or identifiers in a table, for example. Dynamic binding is 

sub-interface data structure Ethl provides information con- accomplished by creating and or deleting binding data 

cerning the Ethernet connection such as a VCC1 sub- structures-and or changing the values of the pointers or 

interface data structure provides information concerning the indices so they operate on different data structures. It will be 

VCC1 connection. A binding data structure B4-0 binds the ^ appreciated that actual changing of the bindings can be 

Ethernet sub-interface data structure to NIF4-0. A binding accomplished through entries in a command line interface to 

data structure B4-2 binds the VCC1 sub-interface data the network device or automataically by snooping the infor- 

structure to NIF4-2. N1F4-1 is unbound. mation flow through the device, for example. 

FIG. 17 depicts the same network device as in FIG. 16, The illustrative drawing of FIG. 21 is a generalized block 

except the binding B4-0 has been eliminated, and binding 45 diagram of a subscriber management system in accordance 

B4-1 has been created. Binding B4-1 associates the Ethernet with a presently preferred embodiment of the invention. A 

sub-interface data structure with interface NIF4-1 of virtual subscriber is a user of network services. The system includes 

router VM4. Interface NIF4-2 remains bound to the sub- a computer with layer 1/2 connections to subscriber end 

interface data structure VCC1. The interface BR4-0 is not stations and with layer 1/2 connections to network devices 

bound to any layer 2 construct. These changes in binding 50 that provide access to other networks, 

effectively redefines the service available on the Ethernet The system can form a multiplicity of layer 1/2 subscriber 

segment from a bridged or layer 2 service, to a routed or end station connections. In a present embodiment, the layer 

layer 3 service. In a presently preferred embodiment of the 1/2 connections to subscriber end stations include virtual 

invention, these bindings can be changed without reconfigu- circuit connections. The system memory stores a multiplic- 

ratioo of any other interface constructor circuit construct. In 55 ity of sub-interface data structures that instantiate the mul- 

a typical earlier intermediate station, the bindings between tiplicity of virtual circuit connections through which sub- 

the higher and lower layers are implicit, and a change in the scriber end stations communicate with the subscriber 

implicit bindings applied to the bridge and router interface management system. 

constructs typically would have required a modification of The system instantiates in memory a plurality of virtual 

these interface constructs. A present embodiment of the M network machines. Each VNM of the embodiment of FIG. 

invention does not require such modification. 21 implements the functionality of a router. There are nine 

FIG. 18 depicts the same network device as in FIG. 14, illustrative VNM routers shown in FIG. 21 labeled 

except the binding B3-0 has been eliminated and binding VNMrl-VNMr-9. Each VNM router includes interfaces in 

B3-2A has been created. Binding B3-2A associates the its database. Each VNM router runs at least one layer 3 

Ethernet sub-interface data structure with the NIF3-2 inter- 65 protocol suite. Each VNM router may run one or more 

face. Binding B4-2 associates the VCC1 sub-interface data adaptive routing algorithms. The interfaces of each VNM 

structure with N1F3-2. Interfaces NIF3-0 and NIF3-1 are router provide access to a network that is isolated from the 
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networks accessed through the interfaces of the other VNM 
routers. For example, the interface to VNMr-4 provides 
layer 3 access to the network that includes ISP#2. The 
interface to VNMr-5 provides Layer 3 access to the network 
that includes Corporate-Private-Network#A. The interface S 
to VNMr-6 provides layer 3 access to the network that 
includes ISP#4. The networks with ISP#2, Corporate- 
Private -Network#A and ISP#4 are isolated from each other. 
The databases associated with VNMr-4, VNMr-5 and 
VNMr-6 to control access to networks across these respec- 10 
live interfaces. Each of these three VNM databases can be 
administered separately. 

In operation a subscriber might establish a point-to-point 
connection with the subscriber management system. A 
server that runs software that runs authentication, aulhori- 15 
zation and accounting protocols (AAA) searches for a record 
that identifies the user. Authentication is the process of 
identifying and verifying a user. For instance, a user might 
be identified by a combination of a usemame and a password 
or through a unique key. Authorization determines what a 20 
user can do after being authenticated, such as gaining access 
to certain end stations information resources. Accounting is 
recording user activity. In the present embodiment, AAA 
involves client software that runs on the subscriber man- 
agement system and related access control software that runs 25 
either locally or on a remote server station attached to the 
network. The present embodiment employs Remote Authen- 
tication Dial-In User Service (RADIUS) to communicate 
with a remote server. An example of an alternative AAA 
protocol is Terminal Access Controller Access Control Sys- 30 
tem (TACACS+). RADIUS and TAC AS + are protocols that 
provide communication between the AAA client on a router 
and access control server software. 

The subscriber, record includes information concerning 
the network to which the subscriber's virtual circuit con- " 
nection should be bound. Typically, the subscriber will 
employ a PVC. Based upon the information in the subscriber 
record, a binding data structure, like that described in 
reference to FIGS. 14 to 20, will be created to associate the 
sub-interface data structure that instantiates the PVC in the 40 
subscriber management system memory with the interface to 
the VNM router that accesses the network identified for the 
subscriber in the subscriber record. 

Moreover, the subscriber record may provide multiple 45 
possible binding options for the subscriber. For instance, the 
subscriber may specify the creation of a binding that is 
which is to be employed during business hours and which 
binds the subscriber to VNMr-5 which provides layer 3 
network access to the Corporation-Private-Network#. The 5Q 
same record may specify another binding which is to be 
employed only during non-business hours and which binds 
to VNM#4 which provides layer 3 network access to ISP#2. 
Thus, the bindings can be changed. They are dynamic. 

Various modifications to the preferred embodiments can S5 
be made without departing from the spirit and scope of the 
invention. Thus, the foregoing description is not intended to 
limit the invention which is described in the appended 
claims in which: 

What is claimed is: ^ 

1. A computer implemented method comprising: 

routing Internet Protocol (IP) packets within a first Inter- 
net Service Provider's (ISP's) domain from a single 
network device with a first database, the first database 
including addresses of the first ISP's domain; and 55 

routing IP packets within a second ISPs domain from the 
single network device with a second database, the 
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second database being separate from the first database 
and including addresses of the second ISP's domain. 

2. The computer implemented method of claim 1, wherein 
the first database also includes control and policy informa- 
tion for the first ISP's domain and the second database 
includes control and policy information for the second ISP's 
domain. 

3. The computer implemented method of claim 1 further 
comprising connecting a subscriber to the first ISP's domain 
with an authentication, authorization and accounting proto- 
col. 

4. The computer implemented method of claim 1 further 
comprising: 

routing IP packets within a corporation's domain from the 
single network device with a third database, the third 
database being separate from the first and second 
databases, wherein said third database includes 
addresses of the corporation's domain. 

5. The computer implemented method of claim 1 further 
comprising: 

providing the corporation administrative control of the 
third database, but not the first and second databases; 

providing the first ISP administrative control of the first 
database, but not the second and third databases; and 

providing the second ISP administrative control of the 
second database, but not the first and third databases. 

6. The method of claim 1 further comprising routing the 
packets within the first ISP's domain with a global database 
that includes globally known addresses if the packets cannot 
be routed within the first ISP's domain with the first data- 
base. 

7. A memory having a set of one or more programs stored 
thereon to cause a single network device to perform opera- 
tions comprising: 

maintaining a first database separately -from a second 
database in the single network device, the firsl database 
having addresses for a first Internet Service Provider's 
(ISP's) domain and the second database having 
addresses for a second ISP's domain; 

routing Internet Protocol (IP) packets within the first 
ISP's domain from the single network device with the 
first database; and 

routing IP packets within the second ISP's domain from 
the single network device with the second database. 

8. The memory of claim 7 further comprising providing 
access to a subscriber to the first ISP's domain with an 
authentication, authorization, and accounting protocol. 

9. The computer implemented method of claim 7 further 
comprising: 

maintaining a third database separately from the first and 
second databases, wherein the third database has 
addresses for a corporation's domain; and 

routing IP packets within the corporation's domain with 
the third database from the single network device. 

10. The computer implemented method of claim 9 further 
comprising: 

providing the first ISP administrative control of the first 
database, but not the second or third databases; 

providing the second ISP administrative control of the 
second database, but not the first or third databases; and 

providing the corporation administrative control of the 
third database, but not the first or second databases. 

11. The memory of claim 7 wherein the set of one or more 
programs cause the single network device to perform opera- 
tions further comprising: 
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maintaining a third database separately from the first and 

second databases, wherein the third database has 

addresses of a backbone; and 
routing IP packets within the first ISP's domain with the 

third database if they cannot be routed with the first 

database. 

12. A single network device comprising: 
an electronic memory having 

a first database of network addresses of a first network 
domain that is administered by a first Internet Ser- 
vice Provider (ISP); 

a second database of network addresses of a second 
network domain that is administered by a second 
ISP, the second database being isolated from the first 
database; and 

a set of one or more processors to execute a set of 
instructions that cause the single network device to 
route a first set of packets of the first network domain 
with the first database and to route a second set of 
packets of the second network domain with the second 
database. 

13. The single network device of claim 12 wherein the 
packets are IP packets. 

14. The single network device of claim 12 wherein the 
packets are layer 2 packets. 

15. The single network device of claim 12 wherein the 
first set of packets are transmitted from a subscriber of the 
first ISP and the second set of packets are transmitted from 
a subscriber of the second ISP. 

16. The single network device of claim 12 further com- 
prising: 

the electronic memory further having 

a third database to store network addresses of a third 
network domain that is administered by a 
corporation, the third database being isolated from 
the first and second databases; and 

the set of processors to execute the set of installations 
to further cause the single network device to route a 
third set of packets of the third network domain with 
the third database. 

17. The single network device of claim 12 further com- 
prising: 

the electronic memory having a third database of 
addresses of a network provider that is administered by 
the network provider, the third database being isolated 
from the first and second databases; and 

the set of processors to execute the set of instructions that 
further cause the single network device to route the first 
set of packets with the third database if they cannot be 
routed with the first database. 

18. A method comprising: 

routing packets for a first set of subscribers with a fist 
virtual router, and routing packets for a second set of 
subscribers with a second virtual router, the first and 
second virtual routers being isolated from each other 
within a single network device, the first set of subscrib- 
ers subscribing to a first Internet Service Provider (ISP) 
and the second set of subscribers subscribing to a 
second ISP; 

providing administrative control of the first virtual router, 
which includes a first network database, used by the 
first virtual router to route packets, of network device 
addresses within the first ISP's domain and control and 
policy information for the first ISP's domain, to the first 
ISP; and 

providing administrative control of the second virtual 
router, which includes a second network database, used 
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by the second virtual router to route packets, of network 
device addresses within the second ISP's domain and 
control and policy information for the second ISP's 
domain, to the second ISP, wherein the first ISP does 
5 not have administrative control of the second network 
database and the second ISP does not have administra- 
tive control of the first network database. 
19. The method of claim 18 wherein the packets arc layer 
2 packets. 

10 20. The method of claim 18 wherein providing adminis- 
trative control of the first virtual router comprises allowing 
the first ISP to modify the first network address database and 
the control and policy information governing the first ISP's 

15 domain. 

21. The method of claim 18 further comprising: 
providing a network provider administrative control of a 
global virtual router including a global network data- 
base in the single network device. 
20 22. The method of claim 18 further comprising: 

routing packets for a third set of subscribers with a third 

virtual router; and 
providing administrative control of the third virtual router, 
which includes a third network database of network 
25 device addresses within a corporation's domain and 
control and policy information for the corporation's 
domain, wherein the corporation has administrative 
control of the third virtual router but not the first and 
second virtual routers. 
30 23. The method of claim 22 further comprising providing 
a network provider access to the first, second and third 
network databases and a global network database, said 
global network database being in said single network device. 

24. The method of claim 18 wherein the packets are layer 
55 3 packet. 

25. The method of claim 18 further comprising: 
connecting the first and second set of subscribers to the 

single network device in accordance with an a 
authorization, authentication and accosting protocol. 
40 26. An electronic memory encoded with a set of 
instructions, which when executed on a single network 
device, cause said single network device to perform opera- 
tions comprising: 
45 creating a plurality of collections of processes and mecha- 
nisms for implementing router functionality, each of 
the plurality of collections of processes and mecha- 
nisms operating on a different network database includ- 
ing addresses and control and policy information; 
50 separately storing the network database of each of the 
plurality of collections of processes and mechanisms; 
and 

each of the plurality of collections of processes and 
mechanisms routing packets within a different admin- 
55 istrative domain with its network database and in 
accordance with its control and policy information. 

27. The electronic memory of claim 26 wherein the 
packets are layer 2 packets. 

28. The electronic memory of claim 26 wherein each of 
60 the plurality of collections of processes and mechanisms 

runs its own IP stack. 

29. The electronic memory of claim 26 wherein at least 
one of the different administrative domains is administered 
by an Internet Service Provider. 

65 30. The electronic memory of claim 26 wherein at least 
one of the different administrative domains is administered 
by a corporation. 
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31. A single network device comprising: 

plurality of virtual network machines that are individually 
isolated, each of the plurality of virtual network 
machines to route packets within a different adminis- 
trative domain with 5 
a network address database for the different adminis- 
trative domain, and 
control and policy information for the different admin- 
istrative domain; 

a first port to transmit and receive said packets to and from 10 
subscribers; and 

a second port to transmit and receive said packets to and 
from the Internet. 

32. The single network device of claim 31 wherein the 
plurality of virtual network machines are virtual IP routers. 15 

33. The single network device of claim 31 wherein the 
different administrative domains include an Internet Service 
Provider domain and a corporate domain. 

34. The single network device of claim 31 further com- 
prising a third port to transmit and receive a second set of 
packets to and from a corporate network domain. 20 

35. A single network device comprising: 
communication hardware; 

a set of one or more processors coupled with the com- 
munication hardware; and 

an electronic memory coupled with the communication 25 
hardware and the set of processors, the electronic 
memory encoded with a set of instructions to cause the 
set of processors to, 

host a first virtual router that includes a first network 
database of network device addresses within a first 30 
Internet Service Provider's (ISP's) domain and con- 
trol and policy information for the first ISP's 
domain, and 

host z second virtual router, isolated from the first 
virtual router, that includes a second network uaii- 35 
base of network device addresses within a second 
ISP's domain and control and policy information for 
the second ISP's domain, 

route packets for a first set of subscribers with the 
communication hardware and the first virtual router, 40 
wherein the first set of subscribers subscribe to the 
first ISP, 

route packets for a second set of subscribers with the 
communication hardware and the second virtual 
router, wherein the second set of subscribers sub- 45 
scribe to the second ISP, 

provide the first ISP administrative control of the first 
virtual router but not the second virtual router, and 

provide the second ISP administrative control of the 
second virtual router, but not the first virtual router. 50 

36. The single network device of claim 35 further com- 
prising the electronic memory to host a global network 
database. 

37. The single network device of claim 35 further com- 
prising: ss 

the electronic memory to host a third virtual router that 
includes a third network database of network device 
addresses within a corporation's domain and control 
and policy information for the corporation's domain, 
and the set of instructions to further cause the set of 60 
processors to, 

route packets for a third set of subscribers with the 
communication hardware and the third virtual router, 
and 

provide the corporation administrative control of the 65 
third virtual router, but not the first and second 
virtual routers. 



38. The single network device of claim 35, wherein the set 
of instructions further cause the set of processors to switch 
packets for a third set of subscribers with the communication 
hardware and the first virtual router. 

39. The single network device of claim 35 wherein the 
packets are layer 3 packets. 

40. A network comprising: 

a set of one or more networks; 

a set of one or more end stations communicating, for a 
first set of one or more subscribers of a first Internet 
Service Provider (ISP) and for a second set of one or 
more subscribers of a second ISP, packets; 

a single network access device coupled between the set of 
networks and the set of end stations, the single network 
access device having, 
communication hardware; 

an electronic memory coupled with the communication 
hardware, the electronic memory having stored 
therein, 

a first network database, controllable for administra- 
tion by the first ISP but not the second ISP, 
including network device addresses and control 
and policy information for the first ISP, 

a second network database, controllable for admin- 
istration by the second ISP but not the first ISP, 
including network device addresses and control 
and policy information for the second ISP, 
wherein the fast network database and the second 
network database are isolated from each other; and 

a set of one or more processors, coupled with the 
communication hardware and the electronic 
memory, routing said packets being communi- 
cated for the first set of subscribers with the 
communication hardware and a first virtual router 
that includes the first -network database., and rout- 
ing said packets being communicated for the sec- 
ond set of subscribers with the communication 
hardware and a second virtual router that includes 
the second network database. 

41. The network of claim 40 further comprising the 
electronic memory having stored therein a global network 
database. 

42. The network of claim 40 fiber comprising: 

the set of one or more networks including a virtual 
network of a corporation; 

the set of end stations communicating, for a third set of 
one or more subscribers of the corporation, packets; 

the electronic memory having stored therein a third net- 
work database, controllable for administration by the 
corporation but not the first ISP nor the second ISP, 
including network device addresses and control and 
policy information for the corporation; and 

the set of processors routing said packets being commu- 
nicated for the third set of subscribers with the com- 
munication hardware and a third virtual router that 
includes the third network database. 

43. The network of claim 40 further comprising the set of 
processors switching packets for a third set of one or more 
subscribers with the communication hardware and the first 
virtual router. 

44. The network of claim 40 wherein the packets are layer 
3 packets. 

45. An electronic memory encoded with a set of 
instructions, which when executed on a single network 
device, cause said single network device to perform opera- 
tions comprising: 
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routing packets for a first set of subscribers with a first 
virtual router, and routing packets for a second set of 
subscribers with a second virtual router, the first and 
second virtual routers being isolated from each other 
within the single network device, the first set of sub- 5 
scribers subscribing 10 a first Iniernet Service Provider 
(ISP) and the second set of subscribers subscribing to 
a second ISP; 

providing administrative control of the first virtual router, 
which includes a first network database of network 
device addresses within the first ISP's domain and 
control and policy information for the first ISP*s 
domain, to the first ISP; and 

providing administrative control of the second virtual 
router, which includes a second nelwork database of 
network device addresses within the second ISP's 
domain and control and policy information for the 
second ISP's domain, to the second ISP, wherein the 
first ISP does not have administrative control of the 
second network database and the second ISP does not 
have administrative control of the first network data- 20 
base. 

46. The electronic memory of claim 45, wherein the 
operations further comprise: 

routing packets for the first and second set of subscribers 
with a third virtual router, the third virtual routing 
including a global network database. 

47. The electronic memory of claim 45 wherein the 
operations further comprise: 

routing packets for a third set of subscribers with a third 3Q 
virtual router, the third virtual router being isolated 
from the first and second virtual router within the single 
network device; and 

providing a corporation administrative control of the third 
virtual router, which includes a third network database 35 
of network device addresses of the corporation and 
control and policy information for the corporation, 
wherein the corporation has administrative control of 
the third virtual router but not the first and second 
virtual routers. 

48. The electronic memory of claim 45, wherein the 
operations further comprise: 

switching packets for a third set of one or more subscrib- 
ers with the firsl virtual router. 

49. A method in a single network device comprising: 45 
creating a plurality of collections of processes and mecha- 
nisms for implementing router functionality, each of 
the plurality of collections of processes and mecha- 
nisms operating on a different network database includ- 
ing addresses and control and policy information; 50 

separately storing the network database of each of the 
plurality of collections of processes and mechanisms; 
and 

each of the plurality of collections of processes and 
mechanisms routing packets within a different admin- 55 
istrative domain with its network database and in 
accordance with its control and policy information. 

50. The computer implemented method of claim 49 
wherein at least one of the plurality of collections of 
processes and mechanisms switching packets within its 60 
administrative domain with its network database. 

51. The computer implemented method of claim 49 
wherein each of the plurality of collections of processes and 
mechanisms runs its own IP stack. 

52. The computer implemented method of claim 49 65 
wherein at least one of the different administrative domains 

is administered by an Internet Service Provider. 
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53. The computer implemented method of claim 49 
wherein at least one of the different administrative domains 
is administered by a corporation. 

54. The computer implemented method of claim 49 
wherein the different network database of each of the 
plurality of collections of processes and mechanisms 
includes addressing information, control information, and 
policy information. 

55. A single network device comprising: 
a set of one or more processors; and 

an electronic memory coupled with the set of processors, 
the electronic memory 

having a set of instructions to cause the set of proces- 
sors to, 

create a plurality of collections of processes and 
mechanisms, each of the plurality of collections of 
processes and mechanisms to operate on a different 
network database including control and policy 
information, and to route packets within a different 
administrative domain with its network database and 
in accordance with its control and policy 
information, and 

separately store the different network database of each 
of the plurality of collections of processes and 
mechanisms. 

56. The single network device of claim 55 further com- 
prising at least one of the collections of processes and 
mechanisms to switch packets with its network database and 
in accordance with its control and policy information. 

57. The single network device of claim 55 wherein the set 
of instructions further cause the set of processors to inde- 
pendently run an Internet Protocol stock for each of the 
plurality of collections of processes and mechanisms. 

58. The single network device of claim 55 wherein at least 
one of the different administrative domains is administered 
by an Internet Service Provider. 

59. The single network device of claim 55 wherein at least 
one of the different administrative domains is administered 
by a corporation. 

60. The single network device of claim 55 wherein the 
different network database of each of the plurality of col- 
lections of processes and mechanisms is to include address- 
ing information, control information, and policy informa- 
tion. 

61. A network comprising: 

a set of one or more networks; 

a set of one or more end stations communicating packets 
with the set of networks; and 

a single network device coupled between the set of 
networks and the set of end stations, the single network 
device having a plurality of collections of processes 
and mechanisms, each of the plurality of collections of 
processes and mechanisms, 

operating on a different network database including 
addresses and control and policy information, 
wherein the network database operated on by each of 
the collection of processes and mechanisms is stored 
separately, and 

routing packets within a different administrative 
domain with its network database and in accordance 
with its control and policy information. 

62. The network of claim 61 further comprising at least 
one of the plurality of collections of processes and mecha- 
nisms switching packets with its network database and in 
accordance with its control and policy information. 

63. The network of claim 61 each of the plurality or 
collections of processes and mechanisms independently 
running an IP stack. 
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64. The network of claim 61 wherein at least one of the 
different administrative domains is administered by an Inter- 
net Service Provider. 

65. The network of claim 61 wherein at least one of the 
different administrative domains is administered by a cor- 5 
poration. 

66. A network comprising: 

a set or one or more networks; 

a set of one or more end stations communicating packets 
with the set of networks, and 10 

a single network device coupled between the set of 
networks and the set of end stations, the single network 
device having, 

a first virtual network machine transmitting certain of 
said packets for a first subscriber in accordance with 
a first network database of a first administrative 
domain, tbe first database having addressing and 
policy information of the first administrative domain, 
and 

a second virtual network machine, which is isolated 
from the first virtual network machine, transmitting 
certain packets for a second subscriber in accordance 
with a second network database, the second network 
database having addressing and policy information ^ 
for a second administrative domain. 

67. The network of claim 66 wherein the first adminis- 
trative domain is administered by a corporation. 

68. The network of claim 66 wherein the first adminis- 
trative domain is administered by an Internet Service Pro- 3Q 
vider. 

69. The network of claim 66 wherein the first adminis- 
trative domain provides a first service and the second 
administrative domain provides it second service. 

70: The network of claim 66 wherein the addressing 35 
information in the first database includes layer 2 addressing 
information. 

71. The network of claim 66 wherein the addressing 
information in the first database include layer 3 addressing 
information. 

72. A single network device comprising: 
a first set of one or more ports to receive IP packets from 

a first and second set of one or more subscribers; 
a second set of one or more ports to transmit IP packets 

over a first network domain; 45 
a machine-readable medium having stored therein a set of 
instructions to cause the single network device to, 
instantiate a first and second virtual router, which arc 
virtually-independent but share a set of physical 
resources within the single network device, 50 
the first virtual router to route within a second 
network domain, which is layered upon the first 
network domain, WP packets from the first set of 
subscribers using a first network database that 
includes IP addresses, control and policy informa- 55 
tion defined for the second network domain, and 
the second virtual router to route within a third 
network domain, which is layered upon the first 
network domain and shares the first network 
domain's physical resources with the second net- 60 
work domain, IP packets from the second set of 
subscribers using a second network database that 
includes IP addresses and control and policy infor- 
mation denned for the third network domain, 
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maintain separation between the first and second net- 
work databases so as to avoid management of intcr- 
domain policies, wherein avoidance of inter-domain 
policies eases administrative tasks, 

provide for independent administration of the first and 
second network databases, wherein independent 
administration maintains administrative integrity of 
the first and second network databases. 

73. The single network device of claim 72 wherein the 
first set of subscribers are subscribers of a corporate network 
and the second set of subscribers are subscribers of an 
Internet service provider. 

74. The single network device of claim 72 wherein the 
first network domain is a layer 3 network domain and the 
second and third network domains are layer 3 network 
domains. 

75. The single network device of claim 72 further com- 
prising the set of instructions to cause the single network 
device to tunnel IP packets of the first set of subscribers 
between separate physical locations of a virtual private 
network with the first virtual router. 

76. The single network device of claim 72 further com- 
prising the set of instructions to cause the single network 
device to process the first and second set of subscribers in 
accordance with an authorization, authentication and 
accounting protocol. 

77. A single network device comprising: 

a first set of one or more ports to receive IP packets from 
subscribers; 

a second set of one or more ports to transmit IP packets 

over a first network domain; 
a machine-readable medium having stored therein a set of 

instructions to cause the single network device to, 

instantiate different virtual routers for different network 
domains, which are layered upon the first network 
Hnmain. using separate unshared inter-domain policy 
free, independently administFable network 
databases, wherein each of the separate unshared 
inter-domain policy free, independently adminis- 
trable network databases includes IP addresses, con- 
trol and policy information defined for its one of the 
different network domains, and 

route IP packets of different ones of the subscribers 
using those of the virtual routers for the different 
ones of the network domains to which those sub- 
scribers currently belong. 

78. The single network device of claim 77 wherein the 
subscribers includes subscribers of a corporate network and 
subscribers of different Internet service providers. 

79. The single network device of claim 77 wherein the 
first network domain is a layer 3 network domain and the 
different network domains layered upon the first network 
domain are layer 3 network domains. 

80. The single network device of claim 77 further com- 
prising the set of instructions to cause the single network 
device to tunnel IP packets of certain of the subscribers 
between separate physical locations of a virtual private 
network with the one of the different virtual routers to which 
the certain of the subscribers belong. 

81. The single network device of claim 77 further com- 
prising the set of instructions to cause the single network 
device to process the subscribers in accordance with an 
authorization, authentication and accounting protocol 
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DYNAMIC POLICY-BASED APPARATUS as the focus of a customer-specific firewall. Because some 

FOR WIDE-RANGE CONFIGURABLE services, such as packet filtering, are often done better io 

NETWORK SERVICE AUTHENTICATION routers, commercial firewalls are rarely used by themselves. 

AND ACCESS CONTROL USING A Additional devices, and a design for their use, is often 

FIXED-PATH HARDWARE CONFIGURATION 5 required, which returns the customer to many of the prob- 
lems inherent in a custom firewall. 

COPYRIGHT NOTICE ln addi|ioil) comme rcial firewalls are configured by the 

A portion of the disclosure of this patent document user, who may be unaware of many of the issues and 

contains material which is subject to copyright protection. problems of security policy design. It is estimated that more 

The copyright owner has no objection to the xerographic 10 than 30% of all firewall penetrations happen through a 

reproduction by anyone of the patent document or the patent commercial or custom firewall. This is typically because of 

disclosure in exactly the form it appears in the Patent and poorly thought out configuration. 

Trademark Office patent file or records, but otherwise Because much of the functionality of a commercial fire- 
reserves all copyright rights whatsoever. wall is concentrated in a single box, these devices also invite 

DArvrDnnxm nc tuc iKurcMnnw 15 otncr problems. If the device fails, all communication 

BACKGROUND OF THE INVENTION , . r , . . , . . . , n 

between networks is cut off. There is no ability to gracefully 

This invention relates to internetwork communications degrade service. If the security of one service of the box is 

and data exchanges, and in particular to the security of compromised, this can open a path for an attacker to 

information exchange between computer networks to inhibit compromise other services and widen their access. Also, the 

and detect attempts at vandalism, espionage, sabotage or 20 design of the single-box firewall very strongly affects the 

inadvertent destruction of data. types of policies available to the customer. If a box is 

Computer networks connect multiple computer systems designed primarily as an applications gateway device, it is 

together, allowing them to share information. Initially, the very difficult to configure it in a firewall that will permit 

computers were in one, secure location. As the utility of some services to be performed via packet-filtering only) 

networks grew, il became more and more desirable to 25 One problem that is shared by both types of firewalls is 

connect networks at different locations to allow information that of scalability. Because each type of firewall has strong 

to flow between the computer systems at all sites. As the hardware/software/configuration customizations for each 

number of computer systems grew beyond the point where specific customer, managing the firewalls of more than one 

each user of the network was well known to all other users, 3Q customer is very difficult. Making significant policy changes 

the need for a mechanism to describe and enforce a policy in multiple customer firewalls is also extremely difficult, 

for access, known as a "security policy", became apparent. Because of these SC alabiHty problems, it has been quite 

Two major techniques developed for security policy difficult for a company to offer managed firewall services to 

enforcement. The first is packet filtering, in which a security many customers, since the scaling problems escalate with 

policy specifies >vh?t types of connections are allowed and 35 eaca new customer, 
permits or denies passage of TCP/IP packets of specific 

types through a router. The second technique is application SUMMARY OF THE INVENTION 
filtering, which operates at a higher level, examining the 

specific transactions that pass through a TCP/IP connection, A° improved security handler is provided by virtue of the 

and allowing them or denying them based on the specific M P resent invention. In one embodiment of a security handler 

action being attempted, or the identity of the requester. according to the present invention, a security handler 

. . , , . . . includes means for obtaining customer security policies, a 

When combined, these two techniques comprise a . ... , , . . , .; L r 

a ,, . • , • , i a c il plurality of packet processing components with communi- 

firewall, whose purpose ,s to implement and enforce the ^ions paths therebetween and configurable policy enforce- 

security policy of an organization regarding connections . r - - , , & 

. , J . i ti- . • n .l l u menl means, for enforcing a packet policy over the com- 

between two or more networks. Historically, there have been 45 ■ .• .«. 

c c ii . I ■ i a munica lions paths, 

two major types of firewalls: custom and commercial. A r 

custom firewall is a device or collection of devices designed, 0ne advantage of the present invention is that a single 

purchased, assembled, configured and operated by an orga- configuration of physical components can be configured to 

nization for the purposes of guarding a network intercon- provide a wide range of security policy choices while 

nection. A commercial firewall collects many of the com- 50 remaining capable of solving the foregoing problems of the 

ponents of a custom firewall into a single device, and is sold prior art. 

(and sometimes configured) by a company to make the A further understanding of the nature and advantages of 

installation of a firewall easier and more cost effective. the inventions herein may be realized by reference to the 

Custom firewalls have many potential drawbacks. For remaining portions of the specification and the attached 

one, because they are designed and constructed by a single ss drawings, 
organization that may not have extensive experience in the 

problems of firewall design, they may not account for many BRIEF DESCRIPTION OF THE DRAWINGS 

known problems. Because they are designed and built for a HG x is a block diagram of a sccurity handlcr implc . 

specific purpose, they are typically very difficult to adapt to mentation abbreviated herein as "Device FW". 

new poUcies. This often requires a significant redesign 60 piG. 2 is a schematic of a TCP/IP packet. 

effort, and additional hardware. Because they arc built ma r 

unique manner, each custom firewall requires special FIGS. 3, 3a show a representation of the process of packet 

software, special training, and special expertise in modifi- filtering. 

cations that does not translate into other firewall installa- FIG. 4 shows the points within Device FW with repre- 

tions. 65 sentative points at which a packet filtering policy can be 

Commercial firewalls are designed to consolidate as many established, 

services as possible into a single box. That box is then used FIG. 5 shows an example of an applications level filter. 
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FIG. 6 shows the points within Device FW with repre- telecommunications circuit can be any point-to-point link 

sentative points at which an application filtering policy can capable of carrying TCP/IP traffic, such as a T-l line, a 56K 

be established. line, a modem-based telephone connection, a microwave 

FIG, 7 is a simplified block diagram of Device FW relay, a fiber-optic circuit, 

showing a path for traffic between a workstation located on 5 TCP/IP packets arc moved between networks or between 

a Customer Protected Network (CPN) and a server on the telecommunications circuits and networks, or between dif- 

Internet via a directly-routed, packet-filtered path. fcrcnt telecommunications circuits,^ rome^s^pjn^rsiarerj 

FIG. 8 is a simplified block diagram of Device FW ^^^^^jm^T^^^aing^ : ^ 

showing a path for traffic between a workstation on the CPN ^^oi^y 

and an Internet server via an application-filtered connection. lO^^ngu^ 

„ . . , , , , „ . ™, ^service port,_destination_service port^protocol,_size7) 

FIG. 9 is a amplified block diagram of Device FW ^^^^^y^^^^^^^^^/ 

showing a path for traffic between a workstation on the ^ a cket ^ mQdc] arablc to that of cisco 

Internet and a server on the CPN via a directly-routed, Syj}tems IQS venjion n Q Qr ktcr RoutcfS arc availablc 

packet-filtered connection. J5 from various manufacturcrs ^ as . cisc0f Bay Network, or 

FIG. 10 is a simplified block diagram of Device FW 3Com, 

showing a path for traffic between a workstation on the Servers arc used to provide information to a network or 

Internet and a server on the CPN via an application-filtered collect information from a network. A server provides a 

connection. service which is made available to workstations across a 

FIG. 11 is a simplified block diagram of Device FW 20 network. Servers can come from a variety of manufacturers, 
showing a path for traffic between a workstation on the such as Sun Microsystems, Hewlett Packard, Silicon Graph- 
Internet and a server on a Customer Exported Network ics Digital Equipment Corporation. Servers run a variety of 
(CEN) via a directly routed, packet-filtered connection. operating systems such as Solaris, SunOS, HP/UX, BSDI, 

FIG. 12 is a simplified block diagram of Device FW Linux, FreeBSD. Preferably, whatever operating system is 

showing a path for traffic between a workstation on the 25 used, it should have the ability to eliminate extraneous 

Internet and a server on the CEN via an application-filtered software and services, control access to services and log 

connection. access to services. 

FIG. 13 is a simplified block diagram of Device FW w Se J^. c £ ^^t^^ T'^r^u 7a\ " 

showing a path for traffic between an Internet workstation Wo 'l d :™ de W ? (WWW) File Transfer (FTP), RealAudio, 

and a server hosted within Device FW. 30 ^al^deo, authentication, database, logging. Gateways are 

. ,. ^ . ™, used to apply a secunty policy to an application s network 

FIG. 14 is a simplified block diagram of Device FW Workstaliona are ^ d to comiect human beings 

showing another path for traffic between an Internet work- (Q servefSj and afe a]so used tQ aUow maintainers of Device 

station and a server hosted within Device FW. ^ lQ iQSpect and modify ^ policy of Device ^ 

FIG. 15 i* a simplified block diagram of Device FW 35 Referring now to FIG. 1, the description of several 

showing a yet another path for traffic between an Internet individual components is now provided. An -Externa! Router 

workstation and a server hosted within Device FW. (ER) controls traffic to and from the External network and 

FIG. 16 is a simplified block diagram of Device FW enforces the outermost layer of the Security Policy. A 

showing a path a replication connection would use to copy Maintenance Router (MR) connects the portions of Device 

data from a Customer Backend Server (CBS) to a Hosted 40 FW that implement the security policy with the Maintenance 

Backend Server (HBS). Network that host servers use to record the behavior of FW 

FIG. 17 is a simplified block diagram of Device FW and workstations used to inspect or modify the behavior of 

showing a path for traffic from a workstation on the CPN for . 

directly-routed, packet-filtered access to a Hosted Customer The Customer Local Router (CLR) connects Device FW 

S cr ver. 45 10 the Customer Site Router (CSR). The CLR also limits the 

connections to and from a customer. The CSR connects a 

DESCRIPTION OF THE PREFERRED Customer Exported Network (CEN) and the Customer Pro- 

EMBODIMENTS tected Network (CPN). The CSR defines what types of 

FIG. 1 — System Overview traffic can pass between the CPN and Device FW, between 

FIG, 1 shows the components of Device FW and the way 50 the CEN and Device FW and between the CEN and the 

in which each component is interconnected. Also shown, is CPN. 

an example external network (the Internet) and an example A Log Server (LS) is used by FW to record the stale and 
internal network (the Customer Network) to clarify the behavior of components. From this recorded information, 
relationship between FW and an external and internal net- statistics can be gathered about attacks, performance, con- 
work. 55 nections established, and the success or failure of particular 
In the preferred embodiment, Device FW is a collection of portions of the Security Policy, A Maintenance Workstation 
components, physically connected by networks and tele- (MW) is used to inspect or change the behavior of Device 
communications circuits in an arrangement which permits a FW. 

safe, but flexible range of usage strategies. Networks allow A typical FW contains most or all of the following types 

one component to communicate with another component in 60 of components: 

close physical proximity and the network can be form of Networks 

Snn;^,™™*?^' SUC ? as u Etherae h Fast .f^ 6 ™ 1 ' Telecommunications circuits 

FDDI, ATM or other networks that provide similar rune- _ 

tionality. In the preferred embodiment described herein, 

network traffic is based on the TCP/IP network protocol. 65 Servers 

FW communicates with components that arc not in close Gateways 

physical proximity using telecommunications circuits. A Workstations 
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The embodiment of FW shown in FIG. 1 is shown 
including the following components: 
ER — External Router 
MR — Maintenance Router 
CLR — Customer Local Router 
CSR — Customer Site Router 
LS — Log Server 
MW — Maintenance Workstation 



other information pertaining to the delivery of the packet to 
its intended recipient. All of this information is kept in fields 
within the packet. The payload portion of the IP packet 
contains information pertaining to the specific application 
using the packet. This information is organized as either a 
TCP packet, a UDP packet or an ICMP packet. 

A TCP packet is used to carry part of a connection that is 
"stream -like". This would be information that is in a 
sequence, such as the contents of a file or a virtual terminal 



In addition, FIG. 1 shows the following representative M session. TCP packets arc in sequence, and arc ^reliable" 



20 



components, of which there may be one or more actual 
components in an implementation (only one of each of these 
are shown for clarity and to illustrate the principles 
involved): 

HCS — Hosted Customer Server 

HES1— Hosted External Server, Type 1 

HES2 — Hosted External Server, Type 2 

HBS— Hosted Backend Server 

HAG — Hosted Applications Gateway 

HAS — Hosted Authentication Server 

As shown in FIG. 1, FW also includes the following 
networks and telecommunications circuits: 

ESN — External Services Network 

HSN — Hosted Services Network 

PSN — Proxy Services Network 

ETC — External Telecommunications Circuit 

CTC — Customer Telecommunications Circuit 

The External Network (EN) is shown including: 

ETC — External Telecommunications Circuit 

IN — Internet Network 

IS — Internet Server 

IW — laic i net Workstation 

Also shown for purposes of clarification is a Customer 
Networks, which includes: 
CS — Customer Server 
CW — Customer Workstation 
CPN — Customer Protected Network 
CEN — Customer Exported Network 
CES — Customer Exported Server 
CBS — Customer Backend Server 

There may be any number of CBS and CES systems 45 
connected to the CEN. One of each is shown for purposes of 
clarification. ER is connected to ETC, ESN and PSN. MR is 
connected to ESN, HSN, PSN and MN. CLR is connected 
to CTC, PSN and HSN. CSR is connected to CTC, CPN and 
CEN. ESN is connected to ER, MR, HES1, and HES2. HSN 
is connected to MR, CLR, HCS, HES1, HBS, HAG and 
HAS. PSN is connected to ER, MR and HAG. MN is 
connected to MR, LS and MW. CEN is connected to CSR, 
CBS, and CES. 

FIG. 2 shows a representation of a TCP/IP packet. A 55 
TCP/IP packet is a sequence of bits that is transmitted across 
a network or telecommunications circuit. Computers con- 
nected to the network can transmit or receive TCP/IP 
packets in order to implement a variety of services (such as 
remote terminal sessions, file transfer, and electronic mail). 
A service may require one or more packets to be transmitted 
and/or received. 

A TCP/IP packet includes an "IP" layer and a "Payload" 



which means that tbey are guaranteed to arrive in order if 
they arrive at all. If they do not arrive, their absence will be 
noticed. A UDP packet is intended for data that is nonse- 
quential. UDP packets arc often called "datagrams" to 
15 emphasize their similarity to telegrams, in which the entire 
context of the message is contained within the single packet 
or message. A UDP packet might be used for an occasional 
status update or to log a particular event. UDP packets are 
nonsequential, and arrive in no predefined order. They are 
"unreliable" in that a packet may not arrive and if it does not 
its absence will not be noticed. An ICMP packet is used for 
network signalling. In particular, ICMP messages arc used to 
indicate a problem of some kind with the transmission of 
packets from their source to their destination. Additionally, 
25 ICMP packets can be used to verify that a particular desti- 
nation is reachable without problems. 

Each type of payload packet contains a data field that can 
be filled by the transmitting machine with data that is 
intended for a specific type of service. Thus, if the applica- 
tion in use was a virtual terminal session, and the user sent 
a character by typing on a keyboard, the virtual terminal 
program in use would create an IP packet intended for the 
destination machine, which contained a TCP packet 
intended for the virtual terminal host program on that 
machine, which contained a data field containing the char- 
acter typed. 

FIG. 3 shows a representation of the process of packet 
filtering. TCP/IP Packets (hereafter referred to as simply 
packets) are transmitted through a network by means of 
devices called routers. Routers are connected to telecom- 
munications circuits or other networks by means of inter- 
faces. A router receives a packet on a particular interface, 
and by examining that packet and an internal map showing 
the networks reachable by all of its interfaces (known as a 
routing table) selects an output interface for that packet and 
transmits it through that interface. This process is known as 
routing. In some cases, a router receives a packet which is 
intended for a destination that caonot be reached from that 
router. In this case, transmission fails, and the packet is 
erased and an error message, typically in the form of an 
ICMP packet (see FIG. 2) is returned to the source of the 
packet. This type of failure usually indicates a problem with 
the network. However, some types of routers implement a 
very similar mechanism for causing packets to fail to reach 
their destination because of security reasons, not network 
failure. This process is known as packet filtering, because it 
allows a network manager to filter out unwanted or unau- 
thorized packets from reaching machines on that network. 
££packe t-filtering is based on- the fields used withhra packet 
60 (see FIG. 2\. A - lisijofj Tu les is: loaded: intoithe zrouter: that 
^desCTihes^the- test that- a- realtor should-apply to ej ch^^ckeO 
t p T de term i he^whe ther o r n ot it should be passed OL prevenlecfo 
cfiomipassirj^:(blocke^)^This"list' of TTHes-is-known asta'S 
Router Pa c^^ jjjjij^ is 



30 



35 



40 



50 



layer. The IP layer is used as an "envelope" to hold the r- m-,. " * - 

Payload layer. The IP layer contains the address of thc^s-lqadcfl when:the;router:is:fu^t:hin]ed:orvbut:inipractic e-it 
source machine (the "transmitter" of the packet), the address "Cma y be updated ocrasion ally_byjhuman;intefyentiop wtthQut .1^ 
of the destination machine (the "receiver^ 1 of the packet) and Q; ^Uinujig>^ 
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conditions - that- describe - ho w^ packets. are _to~ be" teste d - for interface or at the interface between a router and a telecom- 

p assage . lhrough~the . rou ter.*;Each entry in the list consists of munications link interface. Each packet filtering policy 

a condition and an actionrlf the packet meet sJhe^OT nditionj application point can be configured to permit or deny packet 

• ^'the'action 'is pelfo imed . ^feeaerally the actions are to pass or flow between the router and the interface. Policy can be 

to block. For passing, the packet is sent through the router 5 applied to packet flow between the interface and the router 

toward its eventual destination. For blocking, the packet is or between the router and the interface. These policies may 

erased and no transmission toward the intended destination be asymmetric, allowing packet flow in one direction but 

takes place. If the packet does not meet the condition, then blocking it in the converse direction, or they may be 

the next rule in the list is tried, and this process continues symmetric, allowing or blocking packet flow in both direc- 

until all the rules have been applied or until a condition is 10 tions. 

met (See FIG, 3a). FIG. 5 shows an example of an applications level filter. 

Packet-filtering routers J typically- bave~an~im pUciU final Examples of applications would include remote terminal 

T~nile whictris either ALL PASS T orA LL; BLO CK, -where.tbe service, E-mail, file transfer, games. An application filter is 

co ndition ALfcis one that e ve ry packet meetsT This~fiharnile, 15 a program running on a computer that is interposed in the 
Y^whemer-implidtor-exp ^t^ determ ines the char acterof-how^. network between the source and destination networks. This 

NX-all of. the .previous rules must be writleiirBecause of triis~final computer acts as a barrier that enforces a security policy 

condition, there is always a rulelHaTis a p plicable to a given between the two networks. Aperson wishing to use a service 

p acketT.Ttie case where - the . final . ru le-is ALL PASS iskhown on the other side of the barrier must instead connect to the 

— as a p ermissive" piglicy. With ALL PASS, the previous rules 20 barrier computer. This computer may be configured to run 

must be written so that all cases of packet blockage are software known as a proxy server. A proxy acts as a 

specified in advance, and any packet that does not match one surrogate for the service on the far side. Aproxy server must 

of these cases is allowed to pass. The case where the final be running on the barrier computer for each service that is 

rule is ALL BLOCK is known as a restrictive policy. With to be passed through the barrier. The barrier computer must 

ALL BLOCK, the previous rules must be written so that all 25 be configured to pass no other services than those specifi- 

cases of a packet passage are specified in advance and any cally allowed by the proxy servers running, 

packet that does not match one of these cases is blocked. The A barrier computer is configured with an Application 

conditions for packet matching are based on the fields within Fnterin S Polic y which determines the services which will 

the IP TCP UDP and 1CMP packets and ^ manner m which they are passed. Services may 

FIG. 3 shows two cases of packets entering a router from 30 be ? assed ***ctiy lbrou S h J e ^ or additioDal 
a source network, intended for a destination network. A str f* te ma y be P laced "P™*** before passage is permit- 
Router Packet Filtering Policy has been previously defined ted : ^"f ^Tl^ ** 1 au u th ^ ntl J atlon - £ «™*y 
for that router. Packet Ais the first test case. Packet A has an P°^ymay be defined that requires that only specific people 
arbitrary field (designated by the term field) equal to value ^ be Permitted to access services through a barrier 
x. Packet B is the second test case. Packet B u.s field-equal 35 In ° rder to determine an a ^onzed person is 
to value y. Packet Aenters the router, and the first rule of the ""^ to thc jroxy server, uic pi uxy server may require 
packet filtering policy is applied. Because field is equal to x, some m ™* of authentication, such as a user name and 
this condition matches. The action for this rule is "block", so P*™«* ° Dce tbe P ro P e [ credentials are supplied, the 
this packet is not permitted to pass through the router. No connection passes through the proxy server and the barrier, 
further rules are applied. Packet B enters the router, and the 40 t0 ^rvice on the other side. Unlike a packet filtering 
first rule of the packet filtering policy is applied. Because an applications filtering policy is always restrictive, 
field is not equal to x in packet B, this rule does not apply. wh ' ch means mal unless a se ™ e bas m Wtetiy c ° nfi S' 
The second rule is applied, and it matches. The action for » red ^ server runnmg on the barrier computer, it will not 
this rule is "pass" and so this packet is passed on to the be P™? tbrou g h - , c 
destination network. No further rules are applied. Packet 45 F U IG ' 5 shows a . l yP lcal barne [ f m P uter configuration, 
filtering rules can be written for varying combinations of ^ an authentication server and three proxy servers mn- 
conditions and fields nmg - ^ a PP hcauons filtering policy has been configured 
FIG. 4 shows the points within Device FW at which a int0 tne barrier computer, allowing connections to pass on 
Packet-Filtering Policy can be established, labelled there as: P 0 / 1 23 (W^y gemote terminal session port), port 25 
. so (electronic mail) and port 660 (a game). A connection 
" in request for port 23Acp service arrives at the barrier computer 
Per-esn and is passed to the port 23/tcp proxy. This proxy has a 
Per-psn policy of authentication before passage, so the user's cre- 
Pmr-esn dentials are referred to the authentication server, which 
Pmr-mn 55 a PP roves wem. The proxy then passes the connection 

through to the destination machine. A connection request for 

"P 50 port 25/tcp service arrives at the barrier computer and is 

Pmr-hsn passed to the port 23/tcp proxy. This proxy is configured to 

Pclr-psn pass the traffic along, and does so with no further authen- 

Pclr-hsn 60 tication. A packet for port 666/UDP service arrives at the 

Pclr-ctc barrier computer and is passed to the port 666/udp proxy. 

This proxy server is configured to pass the packet along and 

1 c does so with no further authentication. A connection request 

Pcsr-psn f or p 0r i 512/tcp service, the remote command execution 

Pcsr-cpn 65 service arrives at the proxy. The application filtering policy 

Each of these packet filtering policy application points is does not explicitly cover this service, and so no proxy is 

located at the interface between a router and a network configured for this. Thc request for service is denied. 
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FIG. 6 shows several points at which an Application routed, packet-filtered connection. In order to achieve this, 

Filtering Policy can be established, which are labelled: packet-filtering policy must be set at the following points: 

Ahesl-esn Per-in 

Ahes2-esn 5 Per-psn 

Ahcs-hsn Pclr-psn 

Ahesl-hsn Pclr-ctc 

Ahbs-hsn Pcsr-ctc 

Ahcs2-pn Pcsr-cpn 

Ahbs-pn 10 FIG. 10 shows the path that traffic would take for a 

Ahas-hsn connection from a workstation on the Internet to access 

information on a server on the CPN by means of an 

Anas-nsn application- filtered connection. In order to achieve this, 

Ahag-psn packet -filtering policy must be set at the following points: 

Als-mn I 5 Per-in 

Amw-mn Per-psn 

Each application filtering policy application point is Pclr-psn 

located at the interface between a computer and a network. Pclr-ctc 
Applications filtering policy can be defined based on many 

different types of criterion, which could include: 20 "csr-ctc 

Source address a,£ < ?" 9P S r ■ 1- , k « , ,u 

. . Additionally, apphcation-filtenng policy must be set at the 

Destination address point labelled "Ahag-psn". The packet filtering policy in this 

Source port cas e would be different than that shown in FIG. 9. In 

Destination port 25 particular, the policy set at Pclr-psn would be set to permit 

Protocol (tcp, UDP, I CMP) traffic only to the HAG machine, and the policy at Per-psn 

Time of day would be set to accept traffic only from the HAG machine. 

. . FIG. 11 shows the path that traffic would take for a 
User authentication connection from a workstation on the Internet to access 
Server workload 30 information on a server on the Customer Exported Network 
The more constraining criterion an application filtering (CEN) by means of a directly routed, packet-filtered con- 
policy can apply, the more secure it will be. However, a nection. This is a more typical configuration than the one 
policy configuration for a given customer requires a balance shown in FIG. 10, because a CEN is specifically set up to 
between usability and security if it is to be practical. support this type of access, while restricting the path from 
FIG. 7 shows the path that traffic would take between a 35 the Internet to the CPN. This configuration also provides 
workstation located on the Customer Protected Network mucn more security for the CES, the CEN, ami the s 



(CPN) and needing to access a server on the internet via a on j er to achieve this, packet-filtering policy must be set at 

directly-routed, packet-filtered path. In order to achieve this, the following points: 

packet-filtering policy must be set at the following points: Per-in 

Pcsr-cpn « 

Pcsr-ctc Pclr-psn 

Pclr-ctc Pclr-ctc 

Pclr-psn Pcsr-ctc 

Per-psn 45 Pcsr-psn 

Per-in FIG. 12 shows the path that traffic would take for a 

FIG. 8 shows the path that traffic would take for a connection from a workstation on the Internet to access 

connection from a workstation on the CPN to access an information on a server on the Customer Exported Network 

Internet server by means of an application-filtered connec- (CEN) by means of an application-filtered connection. In 

tion. In order to achieve this, packet-filtering policy must be 50 order to achieve this, packet-filtering policy must be set at 

set at the following points: the following points: 

Pcsr-cpn Per-in 

Pcsr-ctc Per-psn 

Pclr-ctc S5 Pclr-psn 

Pclr-psn Pclr-ctc 

Per-psn Pcsr-ctc 

Per-in Pcsr-cen 

Additionally, application filtering policy must be set at the Additionally, application-filtering policy must be set at the 

point labelled "Ahag-psn". The packet filtering policy in this 60 points labelled "Ahag-hsn" and "Ahag-psn". The packet 

case would be different than that shown in FIG. 7. In filtering policy in this case would be different than that 

particular, the policy set at Pclr-psn would be set to permit shown in FIG. 11. In particular, the policy set at Pclr-psn 

traffic only to the HAG machine, and the policy at Per-psn would be set to permit traffic only to the HAG machine, and 

would be set to accept traffic only from the HAG machine. the policy at Per-psn would be set to accept traffic only from 

FIG. 9 shows the path that traffic would take for a 65 the HAG machine, 

connection from a workstation on the Internet to access FIG. 13 shows the path that traffic from an Internet 

information on a server on the CPN by means of a directly- workstation would take to access a server that was hosted 
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within Device FW. This type of hosting would provide much 
more security than the configurations shown in FIGS. 9, 10, 
11 and 12, as well as better performance. In order to achieve 
this, packet-filtering policy is set at the points labelled 
"Per-in" and "Per-psn" while application-filtering policy is 
at the points labelled "Ahesl-eso" and "Ahesl-hsn". 

FIG. 14 shows the path that traffic from an Internet 
workstation would take to access a server that was hosted 
within Device FW. This type of hosting would provide much 
more security than the configurations shown in FIGS. 9, 10, 
11 and 12, as well as better performance. This case differs 
from that shown in FIG. 13 in that the Hosted External 
Server (HES1) requires a connection to a "back-end" server 
such as a database server. This type of configuration is very 
common for large World-Wide-Web servers. The server in 
this case is connected to the Customer Exported Network. In 
order to achieve this, packet-filtering policy must be set at 
the following points: 

Per-in 

Per-psn 

Pclr-hsn 

Pclr-ctc 

Pcsr-ctc 

Pcsr-cen 

Additionally, application-filtering policy must be set at the 
points labelled "Ahesl-esn" and "Ahesl-hsn". 

FIG. 15 shows the path that traffic from an Internet 
workstation would take to access a server that was hosted 
within Device FW. This case differs from that shown in FIG. 
13 in that the Hosted External Server (HES1) requires a 
connection to a "back-end" server such as a database server. 
This type of configuration is very common for large 
Worldwide- Web servers. In this case, the server is also 
hosted wiihiu Device FW. In crde: to achieve- tbi* ; packet- 
filtering policy is set at the points labelled "Per-in' 1 and 
"Per-psn" and application-filtering policy is set at the points 
labelled "Ahes2-esn", "Ahes2-pn", "Ahbs-pn" and "Ahbs- 
hsn". 

FIG. 16 shows the path that a replication connection 
would use to copy data from a Customer Backend Server 
(CBS) to a Hosted Backend Server (HBS). By doing this, in 
conjunction with the configuration shown in FIG. 15, pro- 
vides a much more secure and robust mechanism for build- 
ing a high-performance, high-capacity External Server. In 
order to achieve this, packet-filtering policy is set at the 
points labelled "Per-in" and "Per-psn" and applications 
filtering policy is set at the points labelled "Ahbs-pn" and 
"Ahbs-hsn". 

FIG. 17 shows the path of traffic from a workstation on the 
CPN for directly-routed, packet-filtered access to a Hosted 
Customer Server. This approach would be taken for servers 
that would normally reside on the CPN, but for various 
reasons such as security or difficulty of administration, have 
been outsourced and now reside within Device FW. In order 
to accomplish this, packet-filtering policy must be set at the 
following points: 

Pcsr-cpn 

Pcsr-ctc 

Pclr-ctc 

Pclr-hsn 

Additionally, applications filtering policy must be set at the 
point labelled "Ahcs-hsn". 

It should be readily apparent to those skilled in the art that, 
after reading this description, a wide range of modifications 
may be made to the security policy of a particular embodi- 
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ment of this apparatus. The scope of the invention should, 
therefore, be determined not with reference to the above 
description, but instead should be determined with reference 
to the appended claims along with their full scope of 
5 equivalents. 

What is claimed is: 

1, A security handler for packet transfer between an 
insecure network and a secure network wherein packets are 
passed or blocked between the insecure network and the 

10 secure network to secure the secure network against attacks 
from the insecure network, the security handler comprising: 
means for obtaining customer security policies, wherein a 
customer security policy is a set of one or more rule 
defining a set of capabilities that are allowed or disal- 
15 lowed for a given customer's secure network, modifi- 
able in response to security attacks encountered, and 
wherein customer security policies can be distinct for 
distinct customers; 
a plurality of packet processing components; 
20 a plurality of communication paths between components 
of the plurality of packet processing components; and 
configurable policy enforcement means, at each connec- 
tion of a communication path and a packet processing 
^ component, for enforcing a packet policy for packets 
transported between the communication path and the 
packet processing component, wherein the packet 
policy is a function of the customer security policies. 

2. A method of securing a plurality of secure customer 
networks while connected to one or more insecure networks, 
wherein packet traffic on an insecure network can be gen- 
erated and observed without authorization, comprising the 
steps of: 

interposing a packet processor between at least one of the 

35 secure networks and at least one of the insecure 
networks, wherein 'he packet processor includes a 
plurality of paths over which packets are transported 
between the at least one insecure network and the at 
least one secure network; 

4Q identifying control points within the packet processor, 
wherein a control point is a node through which packet 
traffic having predetermined characteristics flows and 
each control point has a set of one or more predeter- 
mined characteristics associated therewith; 

45 storing customer security policies for each of the plurality 
of secure customer networks, wherein a customer secu- 
rity policy specifies capabilities that are allowed or not 
allowed with respect to an associated secure customer 
network, wherein a capability is provided by packet 

50 traffic having predeternined characteristics associated 
with that capability and wherein customer security 
policies can be distinct for distinct customers; 
controlling packet traffic flow between control points in 
accordance with the customer security policies to limit 

55 the capabilities of-each secured customer network to 
those capabilities allowed by the customer security 
policies; and 

modifying customer sercurity policies in response to 
security attacks encountered, 
so 3. The method of claim 2, wherein the capabilities con- 
trolled in the step of controlling packet traffic flow include 
e-mail transport, hypertext transport protocol packet 
transport, file transport and remote terminal access. 
4. The method of claim 2, wherein the predetermined 
65 characteristics include a packet source address and port, a 
packet destination address and port, a protocol, a time of 
day, and an authorization status. 
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5. A method of controlling a data exchange between a 
customer network and an external network comprising the 
steps of: 

for each data object, a data object comprising one or more 
packets, identifying object features including a packet 
source address and port, a packet destination address 
and port and an object protocol; 

applying policy rules from a policy table to features of the 
data object, where a policy rule is a variable rule, 
determined by a customer policy, for excluding or 
including the data object based on the object's features, 
wherein customer policies can be distinct for distinct 
customers and customer policies are modifiable in 
response to security attacks encountered; and 

if a policy rule indicates that a data object should be 
included and no policy rule indicates that the data 
object would be excluded, forwarding the data object 
from its source to destination. 



10 



15 



14 



6. A common hardware platform for handling multiple 
customer networks, each of which may have different secu- 
rity policies, the common hardware platform comprising: 

a policy database containing a plurality of policies for the 
multiple customer networks, wherein a policy is a rule 
about what capabilities are allowed or disallowed for 
traffic between a customer network and another net- 
work when the conditions of the rule are met and 
wherein policies can be distinct for distinct customers 
and policies are modifiable in response to security 
attacks encountered; and 

means for controlling data traffic between a customer 
network and another network based on the policies in 
the policy database. 

7. The common hardware platform of claim 6, wherein the 
means for controlling data traffic selects an action from 
permitting the data traffic to flow, denying the data traffic and 
redirecting the data traffic. 
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ABSTRACT 



A packet filter for a router performs generalized packet 
filtering allowing range matches in two dimensions, where 
ranges in one dimension at least one dimension is defined as 
a power of two. To associate a filter rule with a received 
packet EP, the packet filter employs a 2-dimensional interval 
search and memory look-up with the filter-rule tabic. Values 
of s m of filter-rule ^-(s^.d^) in one dimension are desirably 
ranges that are a power of two, such as prefix ranges, which 
are represented by a binary value having a "length" defined 
as the number of bits to of the prefix. The d m may be single 
points, ranges defined as prefix ranges, and/or ranges defined 
as continuous ranges. The packet filter employs preprocess- 
ing of the filter-rules based on prefix length as a power of 2 
in one dimension and decomposition of overlapping seg- 
ments into non-overlapping intervals in the other dimension 
to form the filter-rule table. A preprocessing algorithm 
searches in one dimension through filter rules and arranges 
the corrcsponding-filtcr-nilc rectangle segments according to 
prefix length. Then, in the other dimension, the overlapping 
filter rectangle segments are decomposed into non- 
overlapping intervals, and the highest priority filter-rule 
overlapping each non -overlapping interval is associated 
with that interval. A filter-rule table is then constructed with 
entries ordered according to prefix length and non- 
overlapping interval, each entry associated with a particular 
filter-rule. Apacket classification algorithm then matches the 
field or other parameter information in the packet to the 
filter-rule table entries to identify the filter-rule rectangle 
associated with the filter-rule to be applied to the packet. 

30 Claims, 9 Drawing Sheets 
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PACKET CLASSIFICATION METHOD AND "firewall protection" to prevent data or other information 

APPARATUS EMPLOYING TWO FIELDS from being routed to certain specified destinations within the 

network. To perform packet filtering, the router 245 may be 

CROSS-REFERENCE TO RELATED ^^^vid^-wim-a-tablc^ 

APPLICATIONS 5 routing of packetssent from one or more of specified sources-^ 

. ... , . . „ fi , e fil . , . * is.denied-of-th1it ; spccmVactio^ for that packet \ , 

This application claims the benefit of the filing date of . _ —. .^^ w — ^ v. .~7---— r :. -^J-=^ U,. < 

„„ . r . . .... XT , A /mi nnc ci j c^un ^baving-a specified so urce addr ess. Sucb.packct filtering may^-J 

U.S. provisional application No. 60/073.996, filed on Feb. 9, C z_ . — — • ^ ' 

l99 g C~pe em ployed by byer'fpur^swifcliing.appIicatiDrjs.^ 

Specifically, packet -filtering pars es fields ' f romth e^pack^p 

BACKGROUND OF THE INVENTION l0 ^header~l25 includingrfor e xampleTb6th~thej|sourcej^d^ 

. ^destination address es.. Parsin g allows eac h.mcomin^acket 

1. Field of the Invention toTbTclassifiiaiising filter. rules.defined^bymetwork man? 
The present invention relates generally to packet forward- agement software,-rouu^:protocolsror-real-time^reserva- 

ing engines used in telecommunications, and, in particular, \ZTtj on protocols such asRSVPp 

to router algorithms and architectures for supporting pack* i5^ Fil£f ^, es . m ^ aho :^^i tit - aa mp i o r that-receivcd 
filter operations using two packet fields. <>Tpackets-wilh. fields- specif ymg-that-^dSlaTCdEslinlUon 

2. Description of the Related Art — -address shaujd^rshauld-^t'b^fd 

Packet-based communication networks, such as the outpnt~linksror"that soni c" other s pe_ci£c_actio n should be^ 
Internet, typically employ a known protocol over paths or \ taken >efbre.routmg sucti re ceiv^yacKets: Thus, a variety 
links through the network. Commonly known protocols are, 20 of filter rules may be implemented based on packet field 
for example, Transmission Control Protocol/Internet Proto- information. For example, such filter rules might be based 
col (TCP/IP) or Reservation Set-up Protocol (RSVP). Rout- on l) source addresses; 2) destination addresses; 3) source 
ers provided in a communication network provide a packet ports; 4) destination ports; and/or 5) any combination of 
forwarding function whereby input data, usually in the form these fields. 

of one or more data packets, is switched or routed to a 25 Packet filtering of the prior art generally requires either an 
further destination along a network link. FIG. I shows a eX act match operation of the fields or a match operation 
typical form of a data packet 20, which may be of variable defined in terms of field ranges for a filter rule. Field ranges 
length. Data packet 20 comprises, for example, a header 125 may specify, for example, ranges of source addresses, des- 
and payload data 150. Header 125 contains fields or ^ tination addresses, source/destination port numbers, and/or 
parameters, such as a source address 130 where the data protocol types. F ilter lTiles are^Uieh'applied-to every-packet 
originates and at least one destination address 135 where the that.the. router-receivesrtb af ferf or 'each"p a'cket receive d : bV ^ 
data is to be routed. Another parameter in the header 125 5~~th e router, ever y^lte r nile-js'succe ssively-applied* breach 
may be a protocol type 140 identifying a particular protocol^^p^ket-to-ascertaln whether that packet is_t£be- forwarded 
e^p WH in the communication network. ^^reltrictedTor re-routed according to the filter rule. However, 

FIG. 2 shows a router 245 of a network node receiving ' impiemeniaiiuii of a large number of .filter.rules in a router 
streams or flows of data packets from input links 247 and (e.g. 500 or more) is time consuming with respect to 
routing these packet streams or flows to output links 260. To processor execution time since all filter rules must be tested, 
perform a forwarding function, router 245 receives a data Hence, routers implementing filters having a large number 
packet at an input link 247 and a control mechanism 250 ^ of filter rules have decreased throughput, compromising a 
within the router utilizes an independently generated look- quality of service (QoS). Thus, for a router such as router 
up table (not shown) to determine to which output link 260 245 to maintain a relatively high level of throughput, the 
the packet should be routed. It is understood that the packet filtering function must be performed at very high rate, 
may first be queued in buffers 252 before being routed, and ^ IP packct hcadcr ficlds may up to 128 bits of 

that the forwarding function is desirably performed at a high 45 parameter information, including source and destination 
rate for high forwarding throughput. addresses, physical source and destination port numbers, 

Source and destination addresses may be logical interface number, protocol type, etc. Each of the fields or 
addresses of end hosts (not shown). Thus, data packet 20 of parameters in the header may be represented as being along 
FIG. 1 may further comprise unique source port numbers an axis of a dimension. The general packet classification 
137 and destination port numbers 139. Header 125 may also 50 problem of a packet filter may then be modeled as a 
include, for example, certain types of flags (not shown) in point-location in a multi-dimensional space. One or more 
accordance with protocol type 140, such as TCP, depending field values of a packet define a point in the multi- 
upon the receiver or transmitter application. dimensional space. A packet filter rule associated with a 

.Network service providers, while using a shared back- range of values of each defines an object in the multi- 
bone infrastruc^rermay-proyidc:Q^crenCserriccs:to:dif-^ 55 dimensional space. 
-^feirot-atstomera-based-on-different-te^ ^ A point-location algorithm in a multi -dimensional space 

_r^uirements-may-be-d^iffere nt-servi cc-pnangf security, oi J with multi-dimensional objects finds the object that a par- 
VO^ity„oLSemcer(QoS).-To-provide-thcse-oUfferentiatedj ticular poinl belongs to. In other words, given a received 
—^services, routers typically includ e^a-mechanism-for-1) clas-, point EP-fEj, E^ . . . E D } in a space having D dimensions, 
^sifying and isolajM^ find one or more of a set of n D-dimensional objects 

^'customers72)"preventing-unauthori2ed users frmrTaccessing including the point (n being an integer greater then 0). The 
' spe cific, p arls.of . the \. networked 3) providing customized general case of D>3 dimensions may be considered for the 
;^^performance_aiHCbandwidth-m^ problem of packet classification. As is known in the art, the 

-ex pectations'and'pricin g. best algorithms optimized with respect to time or space have 

Consequently, in addition to the packet forwardin g 65 either an 0(log D-1 n) time complexity with O(n) space or an 
function, router 245 of FIG: 2 ma y'p^orm a packeLfilterinp 0(log n) time complexity with 0(rr c> ) space, where 0( ) 
function^Pj^etiUtenug-may te^ mathematically represents "on the order of." Comparing 
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algorithms on the basis of the order of operations is par- 
ticularly useful since operations may be related to memory 
requirements (space) and execution time (time complexity). 

Though algorithms with these complexity bounds are 
useful in many applications, they are not currently useful for 5 
packet filtering. First, packet filtering must complete within 
a specified amount of time, which generally forces a value 
for n to be relatively small relative to asymptotic bounds, but 
routers typically filter packets with a number of filter rules 
in the range of a few thousand to tens of thousands. *° 
Consequently, even point-location algorithms with poly- 
logarithmic time bounds are not practical for use in a 
high-speed router. 

For example, router 245 desirably processes n=lK filter 
rules of D*5 dimensions within 1 fts to sustain a 1 million- 
packets-per-second throughput. However, an algorithm 
employed with 0(log °~ l u) complexity and 0(n) space has 
a log 4 1024 execution lime and 0(1024) space, which 
requires 10K memory accesses (look-ups) per packet. If an 
0(log n) time 0(n 4 ) space algorithm is employed, then the 20 
space requirement becomes prohibitively large (greater than 
1000 Gigabytes). 

For the special case of two dimensions, the filter rules 
defined for field ranges are modeled as objects in two ^ 
dimensions, for example, forming rectangles in the 
2-dimensional space. For a 2-dimensional space having 
non-overlapping rectangles, some packet filter algorithms 
have logarithmic complexity and near-linear space complex- 
ity. However, these algorithms do not consider the special 3Q 
problem related to arbitrary overlapping rectangles in the 
multi-dimensional space requiring a decision of which over- 
lapping filter rules to apply to a packet. The problem may be 
resolved through a priority of the longest field prefix. An 
algorithm of the piiui art where the time complexity is , c 
0(log(log N)) is based on stratified tree searches in a finite 
space of discrete values. Examples of these algorithms are 
discussed in, for example, M. De Berg, M. van Kreveld, and 
J. Snoeyink, Two- and Three-dimensional Point Location in 
Rectangular Subdivisions, Journal of Algorithms, ^ 
18:256-277, 1995. Data structures employed by this prior 
art algorithm require a perfect hashing operation in every 
level of the tree. The pre-processing complexity, without 
using a randomized algorithm, of calculating the perfect 
hash is 0(min(hV,n 3 ), where h is the number of hash 45 
functions that must be calculated and V is the size of the 
space. Consequently, for a 2-dimensional space, longest- 
prefix lookups may result in executions requiring 2 32 cycles, 
even for a relatively small number of filter rules, even if 
pre-processing is only required once every several seconds. $Q 

SUMMARY OF THE INVENTION 

^The- presen t:rrjyeiutarrxtaWto 
_aTneast-OTe-fiitei;nifc:v?^ 

-packet -characteriz ed-by^yaln es _iu„Iirstrapd-second 55 
-dimensionsj-thej^ 

-rouier-in-a-communi cations- n etwork ? In accordance with an 
"exemplary embodiment, a filter-rule table is provided with 
each entry of the filter- rule table corresponding to a prefix 
value having a length in the first dimension and at least one 60 
interval in the second dimension. Each prefix value match- 
ing the value of the packet in the first dimension is identified, 
and each interval corresponding to identified prefix values 
containing the value of the packet in the second dimension 
is retrieved. A solution interval is determined as the interval 65 
associated with the prefix value associated with a predeter- 
mined metric and containing the value of the packet in the 



second dimension; and the filter rule corresponding to the 
solution interval is associated with the packet. 

In accordance with another exemplary embodiment, the 
filter-rule table is created by first assigning each filter-rule to 
one or more prefix values based on the values in the first 
dimension; and then projecting, for each prefix value having 
the same length, values of each corresponding filter rule of 
the prefix value onto the second dimension to define al least 
one filter-rule segment. Each filter-rule segment is decom- 
posed into one or more non-overlapping intervals associated 
with each prefix value having the same length and corre- 
sponding filter rule in the second dimension; and a pointer 
is generated for each non-overlapping interval identifying 
each filter rule contained in the non -overlapping interval. 
The pointer is stored as an entry of the filter-rule table 
associated with a prefix value length and a non-overlapping 
interval. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Other aspects, features, and advantages of the present 
invention will become more fully apparent from the follow- 
ing detailed description, the appended claims, and the 
accompanying drawings in which: 

FIG. 1 shows a typical form of a data packet of a 
communications network; 

FIG. 2 shows a router of a network node receiving and 
forwarding packet streams; 

FIG. 3 illustratively depicts prefix ranges of a field in an 
s-dimension where the prefix ranges are a power of two; 

FIG. 4 illustratively depicts segments of a filter rule 
having one or more field ranges of destination addresses 
projected as horizontal intervals; 

FIG. 5 illustrates a 2-dimensional space for an exemplary 
packet filter in accuiuaiice with the first embodiment of the 
present invention; 

FIG. 6 illustrate steps of an exemplary pre-processing 
algorithm in accordance with the present invention; 

FIG. 7 illustrate steps of decomposing overlapping inter- 
vals into non-overlapping intervals as shown in FIG. 6; 

FIG. 8 illustrates steps of an exemplary classification 
algorithm in accordance with the present invention; 

FIG. 9A illustrates an example of trie structure of an 
exemplary embodiment employing virtual intervals to 
reduce search time of a classification algorithm; 

FIG. 9B illustrates an example of point propagation of an 
exemplary embodiment employing virtual intervals to 
reduce search time of a classification algorithm; 

FIG. 10 illustrates a hardware system for implementation 
of the packet filter in accordance with the present invention 
in a packet forwarding engine or router; 

FIG. 11 shows a filter processor receiving incoming 
packets, storing field parameters and classifying a packet in 
accordance with the present invention; and 

FIG. 12 shows an example memory organization of a 
filter-rule tabic for the system illustrated in FIG. 10, which 
depicts a filter-rule. 

DETAILED DESCRIPTION 

For exemplary embodiments of the present invention, a 
packet filter associates a 2-dimensional filter rule with an 
arriving packet EP having fields S and D. For a unicasl 
forwarding packet filter, these values S and D may be source 
and destination address values, respectively, of the packet. 
For a multicast forwarding packet filter, the value S may be 



01/24/2004, EAST Version: 1.4.1 



US 6,341,130 Bl 

5 6 

the source address value of a packet and D a group identifier Consequently, preferred embodiments of the present 
(ID) that identifies the multicast group that the packet may invention employ preprocessing of the filter-rules based on 
be forwarded to. The value for S may be contained in a range prefix length as a power of 2 in one dimension and decom- 
of binary values s, s being associated with an axis in one position of overlapping segments into non-overlapping 
dimension (the s-dimension). Similarly, the value for D may 5 intervals in the other dimension to form the filter-rule table, 
be contained in a range of binary values d, d being associated A packet filter of the present invention first searches in one 
with another axis in another dimension (the d-dimension). dimension through filter rules and arranges the correspond- 
The packet filter includes a set of n packet-filtering rules RP ing filter-rule rectangle segments according to prefix length, 
having 2 dimensional filter rules r 2 through r„ to be associ- Then, in the other dimension, the overlapping filter rectangle 
ated with the packet. Each filter rule i m , m an integer greater J0 segments are decomposed into non-overlapping intervals, 
than 0, may be denoted as r m ={s m ,d m } f which is a set of two and the highest priority filter-rule overlapping each non- 
field ranges s m and d m in the s-dimension and d-dimension overlapping interval is associated with that interval. A filter- 
that define the filter rule r m in the 2-dimcnsional space. rule table is then constructed with entries ordered according 
To associate a filter rule with a received packet EP, the to prefix length and non-overlapping inter val, each entr y 
packet filter employs a 2-dimensional interval search and js associated with a particular filter-ru le. Thiyfi lter-rule table is* 
memory look-up with the filter-rule table. Locating a pair of-^cogsjntcjte^^ 

values S and D for fields of a packet EP and associating "a^^rj^^tsJPacket classification'in accordance wffitiuj,prescnt 
2-dimensional filter rule with the packet may be modeled as^t-inven tion then processes the received packe ts using mc. field 

a point-location problem in a 2-dimensional space. The ^I^oT^uSer-pararneter-information in-ttie -packet. The field or 

packet EP having field values S and D arrives at the router 2 o olDer parameter information is matched to the filter-rule 

and is defined as a query point (S, D) of a 2-dimensional table entries to identify the filter-rule rectangle associated 

space. For the point-location problem where packet filtering with the filter-rule to be applied to the packet, 

involves orthogonal rectangular ranges, a search in In accordance with the present invention, values for each 

2-dimensions of a 2-dimensional, orthogonal, rectangular s m of r m =(s„,d m ) in the s-dimension are desirably ranges that 

range decomposes each rectangle into a set of 1 -dimensional 25 are a power of two. Consequently, prefix values ("prefixes") 

filter-rule intervals to allow 1 -dimensional searches over define ranges ("prefix ranges") that are a power of two. The 

1 - dimensional intervals. length of a prefix is the number of specified bits of the prefix. 
For a simple embodiment, preprocessing of filler-rules The prefix range is between a lower bound defined by the 

may construct the filter-rule table as a 2-dimensional look- prefix and unspecified bits set to logic "0" and the upper 

up table comprising filter-rule pairs (s^dj, m an integer 30 bound defined by the prefix and unspecified bits set to logic 

greater than 0, where each s m is a prefix of possible source "1"- The length may be represented by a binary value. The 

addresses and each d m is a contiguous range, or a single d m may be single points, ranges defined in a manner similar 

point, of possible destination addresses or group IDs. For the to prefix ranges in the s-dimension, and/or ranges defined as 

table, each pair (s^cLj defines a filter-rule rectangle ^{s^,, continuous ranges. When multiple matches of a same length 

d m } for the n packet-filtering rules- r 1 thro\igh__r ^ in ,35 prefix occur for a specific value of s m , the query point (S, D) 

2- dimensions, and rectangles may overlap. The- point-loca^ is associated with the highesi pi iudty filter r^lc having th? 
uon-in-a-2-dimensional space operates-as-follows:-given-the matching prefix of d m , if an overlap also occurs in the 

- — qu^ry^p^QS^D)^^ d-dimension. 

~Xalg6 rithm'for packe t class ification' find s'an' enclosin g filter-, FIG. 3 illustratively depicts prefixes and prefix ranges of 

^ rule,;rectan glCf^(s^d ^ 40 a field in a s-dimension where the prefix ranges are a power 

^(S, D) is con tained in r m , and such-th als H ,-is the most specific of two. Field values s, which may be source addresses, vary 

^filteracconiing . to a predefined : metric, such . as, ,f oTex ample , from 000 to 111 (binary). An address may be a point (i.e., 
^rthe lon ges t matchin g prefix of-field-.value-S-or-the-MghesD 010) or within a range (i.e., 010 to 101). For a special case, 

priority rule for a given prenx"lengthr J prefix ranges may be a power of 2. For example, if a prefix 

For Internet Protocol (IP) routers employing an algorithm 45 range is defined as Oxx, the prefix, represented as a single 

in accordance with the present invention, look-up tables may value 0, specifies the range OOOto 011. For this example, the 

have as many as 2 16 entries or more. Also, algorithms prefix has a length of 1 corresponding to one specified bit. 

employed may generally be evaluated based on worst-case Two prefixes of length 1 are possible: 1° and Ij 1 . If the 

performance since queuing for header processing is desir- prefix has two bits, or a length of 2, then four prefixes are 

ably avoided to provide a specific Quality of Service (QoS). 50 possible: I 2 °, V, l 2 2 , and I 2 3 . Prefixes of different length 

For the exemplary filter-rule table, a value n may be denned define prefix ranges that are different powers of two. The 

to denote a number of entries in the table, for example a prefix ranges do not overlap. 

multicast forwarding table, corresponding to the n filter rules FIG. 4 illustrates an example of decomposition in the 
r, through r„. An nxn array may be formed in a memory with d-dimension of a 2-dimensional filter-rule rectangle into 
each entry representing the highest-priority filter-rule reel- S5 1-dimensional overlapping segment sets and then into non- 
angle of the n filter rules ^ through r„ enclosing a point overlapping intervals. As described previously, values for 
corresponding to the coordinates represented by the entry. each d m of filter rule r m =(s OT ,d w ) in the d-dimension may be 
An exemplary classification (i.e., look-up) algorithm that any contiguous range and are not necessarily restricted to 
employs this simple table may employ two binary searches, prefix ranges only. FIG, 4 shows a horizontal axis 429 for the 
one for each of the dimension. This exemplary classification 60 d-dimension representing, for example, parameter values for 
algorithm may require 0(log n) time and 0(n 2 ) memory IP destination addresses. The process searches through each 
space. The Ofn 2 ) memory space is due to one rectangle of the applicable filter rules r Jf . . . r 4 to be implemented in 
being represented in O(d) locations. Such simple table might the router for each dimension, and the process may be 
not be preferred, however, for a high-speed router when the implemented before processing of arriving packets. Each of 
number of filtering rules is n"2 16 or greater since the 65 the filter rules r Jt . . . t 4 specifies field ranges such as 
required memory space or memory access time may be d lt . . . d 4 for the d-dimension applicable to the particular 
excessive. parameter of the packet header. 
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Field ranges d lr . . . d 4 are projected as overlapping in the s-dimension. Therefore, each of the filter-rule rect- 

horizontal line segments, with each segment specifying a angles in set RP/ may associated with each prefix P/ (j an 

start point "b," and end point "q/' of a range for a particular integer and l^j^np,). 

corresponding filter rule (i an integer greater than 0). For ^ yalue d / in (he dHlimensioD of me set of filter-rule 

example, d, specifies a first range of source addresses on a 5 recta ngles RP/'-fCP^d,. 1 ), (P/A 2 ), . - - , } is a range in the 

first segment defined by start point b, and end point q dKiiineasion tDat may overlap other ranges . M definedj the 

for filter rule h . Segments may overlap, such as those of d, ^ ^ > * ^ » , 2 

and d,. Consequently, segments are decomposed into non- - -* an ' an < each ' of 

overlapping intervals I (, » ^teger grea ter than 0) ^ ^ J^JJ ^ J g m £J Rp/ formed 

Therefore the segment denned by start point b, and end J0 Wh longer prefixes than mose filter rectangles in set RP/ if 

pomt"q." for filter ruler, has a single associated interval I,, . . . r , . . , * . . • 

u ( tL < j c j V . _* • * «i. » j . I i>t. A filter-rule havmg a longer prefix value in the 

but the segment denned by start point b, and end pomt . . , i* , . r r . . . . ., 

u „ c i * , * ♦ , i r t j i s-dmiension may be defined to have higher priority than 

q, for filter rule r, has three intervals I 1f U, and 1, lL c „ * «,* . , r . a " . r ' 

■ . j 2 , i-pi 1 2 ' , - 3 other filter-rules with shorter prefix length since they are 

associated with filter rule r,. These three non-overlapping . t . . . r c , , , 

i t t j t , 4 f . - a. more speafic with respect to, for example, packet source 

intervals I,, I,, and L are a result of decomposing the 1<: ^_ ei , ' , / h . nn , , 

i j . r ci. i j . , , 15 address. Consequently, if filter-rule rectangles in RP/ and 

overlapped segments of filter rules r„ u, and r, at start or „ , - . _i /c . ^ . , c ,j , • . L 

j v . ( i 1 t H Z cn i Rp / match a pomt EP=(S, D) based on field values in the 

end points. It should be understood that for each filter rule, . \. \/ ' , . , I1T , , . 

r , , r j s-dimension. then the filter-rule associated with RP/ is 

a range of source addresses and a range of destination . . . . _„ , ... ... n '.■ . 

" , . , j apphed to packet EP. The filter-rule associated with RP/ is 

addresses, for example, may be specified ™ ^ * ^ since ^ . q ■ &re formed ^ 

As described previously, values m the s-dimension of 20 b efixes ^ those rectan ^ es formed ^ Rp / 
each rectangle desirably have lengths of a power of 2 when t . , . e , ,. ri 
the values in the s-dimension are defined as prefix ranges. f or thc ^-dimension, the size of the fist of the set of d/ 
Ranges in dimensions being prefix ranges provide con- ^ ues d 5 fined " ^ k ™ mle g" S reater man / 1 ' 
straints such as illustrated in FIG. 3. When prefix range *T f ach ^ J »nges in a rule set RP, compnsing (s„ 
intervals have lengths which are powers of two, arbitrary « 6 p> a I 151 °J non-overlapping intervals ID/ is formed along 
overlapping of filter-rules for the dimension does not occur ihc °l thc d-Jmcnaon from filter-rule segments Id/ 
since two prefixes of the same length do not overlap. Also, corresponding to the values of d/ The size of this new set 
a prefix range interval starts from an even-value point and of intervals ID/ may be K/S2k/ri. By representing the 
terminates at an odd-value point. Consequently, a set of on Z m *} k ' overlapping intervals as non-overlapping 
prefix ranges form several distinct cells distinguished by the 30 Nervals, a memory space requnemcnt of the packet filter 
length of the prefix or, equivalently, the length of the range. ma y bc * c "*sed by only a constant factor of 2. 
Further, values for each d m of filter rule r m =(s m ,dj in the For the d-dimension, if the values for d/ are defined to be 
d-dimension may be any contiguous range, such as illus- P refix ran S es > mea the projected filter-rule segments Id/ 
trated in FIG. 4, and are not necessarily restricted to prefix ^ong the d-dimension axis do not overlap, and so the Id/ 
ranges unless the value for dL is defined as a prefix range. 35 become the fist of non-overlapping intervals ID/. 
However, modifying the packet filter in accordance with the Fur the general c^ce, replacing overlapninp intervals by 
present invention to define values for d m as prefix ranges non-overlapping intervals allows a search algorithm to 
may be desirable, such as if destination addresses are locate the field value D from the query point (S, D) on one 
concatenated with layer-4 destination ports or some other of these non-overlapping rectangles during the search pro- 
similar header field. 40 cedure. The search algorithm then retrieves the associated 

In accordance with the present invention, filter-rule table enclosing rectangle of the non-overlapping rectangles rep- 
cells for prefix ranges and associated non-overlapping inter- resenting the filter rule to be applied to the packet, 
vals are defined containing pointers to filler-rules as entries Consequently, when many filter-rule rectangles overlap a 
in the filter-rule table in the following manner. Given each S ivei1 interval in the d-dimension, the particular filter-rule 
rule r^s^d^), for the field range s, that is an integer power 45 rectangle associated with the given interval when non- 
of 2, the length is defined as l si bits and for the field range overlapping intervals are formed is the filter-rule rectangle 
d, the length is defined as 1^ bits. The maximum values of with the highest priority that overlaps the interval, 
lengths \ si and l dl are defined as l sMAX and 1^^^ respec- FIG. 5 illustrates a 2-dimensional space for an exemplary 
lively. The set of prefixes having a length of i bits are packet filter in accordance with the first embodiment. FIG. 
denoted as P f , i<0=\ sM Ax- As described with respect to FIG. 50 5 shows a total of np!=2 prefixes of length i equal to 1 (i.e. 
3, there may bc several different prefixes of a given length Oxxx and lxxx). For the set of rectangles RPj with prefix 
i, i.e. the set of prefixes of length 1 (Pj) may have up to two length i equal to 1, the corresponding set of filter-rule 
elements, prefixes starting with "0" and prefixes starting rectangles is RP 1 -{el,.e6}. Also shown is a total of np 2 »l 
with "1". The value np,- denotes the number of elements in prefixes of length i equal to 2 (i.e., Olxx) for the set RP 2 of 
the set of prefixes of length i (P,) that are present in the 55 filter-rule rectangles formed with prefixes of length i equal 
lookup table. The elements of the set of prefixes of length i to 2. The set RP 2 includes the filter-rule rectangles {e2, e3, 
(P,-) may be numbered in ascending order of their values; e4}. These filler-rule rectangles may overlap on the axis of 
consequently, thc np,- prefixes of the set P,- are defined as the the d-dimension. Similarly, set of filter-rule rectangles RP 3 
set {P^P, 2 , . . . Pf**}. with prefix of length i equal to 3 (i.e., Ollx) contains one 

The set of filter-rule rectangles RP={RP„ RP 2 , . . . , 60 filter-rule rectangle eS. 

RPuteAx) ^ defined such that each RP; is a subset of the set For the illustration shown in FIG. 5, the set of intervals 

of n filter rule rectangles RPsuch that subset RP ( includes all given a prefix length of 2 that are created after this overlap 

filter-rule rectangles formed from s value prefixes having a elimination for each Id 2 J is ID 2 1 ={a 0 , a lf ... a 6 }. Filter-rule 

length of i bits. Further, each subset RP, may be defined as rectangles e2 and e3 overlap in the d-dimension. Filter-rule 

the union of the sets of filter-rule rectangles RP/={(P/,d/), 65 rectangle e3 of the set of rectangles RP 2 J is associated with 

(P/jd, 2 ), . . . ,} where each filter-rule rectangle RP/ has the interval ^ since this filter-rule rectangle may be defined to 

}Vt prefix of length i (P/) as a side of the filter-rule rectangle have the higher priority than filter rule rectangle e2. 
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Consequently, only this filter-rule rectangle e3 is associated FIG. 6, first, at step 701 the overlapping intervals Id/ are 

with interval % even though another filter-rule rectangle sorted into an ascending sequence based on interval starting 

with lower priority overlaps this range points. Then, at step 702, for all j, if an overlapping interval 

For the exemplary system of FIG. 5, a packet EP with Id/ starts or ends, an assigned, non-overlapping interval ID/ 

header field values (S-0110, D-0101) arrives. First, a 5 is generated for previous interval. For step 604 of FIG. 6, at 

matching prefix of length 1 from S-(0) is found and a search s t C p 703, the assigned, non-overlapping intervals ID/ and 

performed for enclosing rectangles formed with this prefix. corresponding pointer to actions for the highest-priority 

The d-dimension is searched and filter-rule rectangle el fii ter .rule rectangle overlapping this interval are stored in 

shown in FIG. 4 is a first candidate rule, or is the current memory . Optionally, at step 704 the newly created interval 

solution. Note that rectangles el and e6 of FIG. 5 are the w and thc viousIy storcd adjaccnl mtcrval are mmpmdf 

only rectangles m the set of rectangles with prefixes of and afe d tf ^ tWQ mt tQ ^ &amc 

length equal to 1. Next, a search for the matching prefix (01) GU , c . • . nn j. * j 4 * u 

is performed over the prefixes of length 2. Rectangle e3 is filtCr ' rU c " SmcC a »V » <™f *> when 

determined to be a better candidate rule since 1) the D value an °™ri«PPing interval begins ; or germinates, the size of this 

of the arriving packet overlaps with the range a2, 2) this n ™ sci of mtcrv ^ k *?/ ts ^'^fc} wherc ^ 15 th ° aZC 

filter-rule rectangle c3 is formed with a longer prefix than 15 of ^ ^ of overlapping intervals Id/, 

rule el, and 3) this filter-rule rectangle has higher priority In accordance with the pre-processing algorithm of the 

than other rectangles formed with prefixes of equal or lower packet filter, each filter-rule is associated with a pointer in 

length. Finally, a matching prefix (001) of length 3 is located one or more filter-rule table entries. Each filter-rule pointer 

and a search among rectangles with this prefix is performed, is stored in exactly one address in memory corresponding to 

resulting in the rule of rectangle e5 as the best solution. 20 prefix and prefix length on the s-dimension axis, and one or 

* A packet filter of the present invention for a router more addresses corresponding to non-overlapping intervals 
employs an algorithm having two parts. The first-part-isjtf} on the d-dimension axis. The set of filter-rule rectangles 

— pre-processing-algorilhm-toat-sea^lwsjthro associated with a prefix is stored as a list of non-overlapping 
\v and decomposes tfie filte r rules f or each di^mehsion^Th^fiist? intervals and requires space only proportional to the size of 

^^part-js/p^onned ^y-m er 25 me set. Only O(n) memory space may be utilized to store all 

c^ecejyedjpackets. Asecond.part:is.a.classmcaiion-algQrithm me rec tangles since each rectangle appears only in one set 

^^process^ezreceiyedlpackets:^ and therefore the size of tne mioa of all ^ ^ 0(n) 

-par ameter-mformationTi nr;acco rdaDce-with~tber processed . 

L_filtcrMe^f: me.prc-prc^ 0nce lhe preprocessing algorithm creates the filter-rule 

An exemplary pre-processing algorithm for a packet filter 30 ^ blc > the classification algorithm performs a look-up search 

in accordance with the present invention is shown and ° f lhe filler-rule table FIG. 8 illustrates an exemplary 

described with respect to FIG. 6 and FIG. 7. The pre- Jow-chart of the classification algorithm of the packet filter, 

processing algorithm performs three operations to decom- ™ c classification algorithm may begin at step 801. First, at 

pose the n filter-rule rectangles. First, the filter-rule rect- 8 C 01 ' P/^f °[^h 1, MP,- , P, , . . . , P/*"} are 

angles- are -separata! hased on the orefix length in the 35 identified Initially, the : value of 1 may start from the shortest 

s-dimension. Second, for each prefix of length i, all associ- f 1 ™ ^ch 5.1= 1 ._Nerf...t step 802 the.prefix P/ of 

ated filter-rule rectangles are projected onto the correspond- len S th 1 Wlth « ma * b T "?S tbe ^P 0 ^ S . m n ! he 

ing axis in the d-dimension to obtain first the overlapping pension IS ^termmed. If no match of S with s,- id . P/ is 

intervals Id/. Third, a set of non-overlapping intervals ID/ * mnd J[ St< ? S02 > th , en al £ onthm moves to ste P 805 At 

are created from these the overlapping intervals Id/. The 40 f te P 805 > * e f efix v u ah * 1 ^ incremented untd the 

non-overlapping intervals may be created by a scan of the P ref ? X ™Z th * * arel ! ed in^ement 1 if i<l^) 

overlapping intervals from lower to higher coordinates in the Consequently, the c assification algorithm repeats for each 

d dimension prefix length until all prefix lengths have been searched. 

FIG. 6 illustrates a flowchart of an exemplary pre- If a match of S with an s, m P/ is found at step 802, then 

processing algorithm in accordance with the present inven- 4 5 at ste P 803 ±c stored structure m the d-dimension associated 

tion. First, at step 601 the set of prefixes P/ (as defined Wlth p / 15 marched to find the non-overlapping interval ID™ 

previously) for all i and j, 1 £i£ and 1 ijl inp„ is m that contains the query point D in the d-dimension. At step 

stored in memory according to, for example, an efficient trie 804 the current solution is set as the pointer associated with 

representation. Then, at step 602 for each filter-rule having table entrv (P/W ) (m an integer greater than 0). The 

prefix P/, the corresponding set of filter-rule values d/ in thc 50 current 5011111011 ™Y bc " best " solution among all prefix 

d-dimension are projected as overlapping segments Id/. At lcn S ths sca rcbed so far if shorter prefix lengths correspond 

step 603, for all P/ f (i.e., for all j prefixes of length i, to lowcr priority rules, and the search begins at the shortest 

l = i=Ufi*A-and l^jl^np,.), the overlapping segments Id/ P rcfix ( iowcst Verity) and goes to the longest prefix 

are decomposed into a set of non-overlapping intervals ID/. (highest priority). The algorithm then moves to step 805. 

At step 604 a pointer is constructed to identify the highest 55 The number of iterations of the classification algorithm in 

priority filter-rule rectangle overlapping the associated non- die worst case is equal to the largest number of possible 

overlapping interval for all intervals of the set ID/. At step prefix lengths, which is \ lMAX . Consequently, the total time 

605, thc set of non-overlapping intervals ID/ arc stored with for searching through all prefix lengths is 0(1^^) times the 

associated prefix P/ as tabic entry in the filter-rule tabic. time to search a list for a prefix length. In addition, the size 

Each entry of the filler-rule table corresponds to the pointer 60 °* the lists of ID/ for a prefix length may be O(n) since there 

identifying actions to applied to a packet for a corresponding are n filter-rules. Hence, an average 0(log n) time is needed 

filter rule. The list of non-overlapping intervals ID/ may be to search each list for a matching entry. The worst case total 

stored in sorted sequence using cither an array or a binary execution time of the exemplary classification algorithm is, 

tree. At step 606, thc algorithm returns to step 602 if i<l itA44 ^, therefore, 0(l xA4U -log n). 

or until all prefix lengths P t . are processed. 6S However, for large numbers of table entries, worst case 

FIG. 7 is a flowchart illustrating thc decomposition of performance may not bc sufficient for available processor 

intervals of the steps 603 and 604 of FIG. 6. For step 603 of speed. For example, if a number of possible prefix lengths 
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l sMAX is 32 and the number of table entries n is 2 18 =256K. 
This exemplary classification algorithm may perform 576 
memory accesses in the worst case, which may be prohibi- 
tively high. An alternative embodiment of the present inven- 
tion employs a trie structure with virtual intervals for storage 
of data in memory to reduce the worst-case time -complexity 
°( 1 smaxI°& n ) to a time-complexity 0(1,,^). 

A trie structure may be employed for data storage with a 
memory space requirement that may be O(n). Furthermore, 
the order of search for the sets of filter-rules RP ls RP 2 , . . , 
, may be organized by increasing order of prefix lengths. For 
example, a set of intervals from RPj is searched before 
searching a set of intervals from RP 2 and so on. The search 
proceeds in levels L;, with a search of sets belonging to RPj 
being on the first level L 1? those in RP 2 being on the second 15 
level L^and so on. The number of non-overlapping intervals 
in all of RP, is defined as N ( . The root (i.e, bottom-most) 
level R t> has non-overlapping intervals, and this level 
may be RPj with N 1 non-overlapping intervals. The number 
of overlapping intervals at each level without introducing 20 
virtual intervals may be 0(n). In accordance with the present 
invention, introducing "virtual" intervals decreases search 
time of the classification algorithm in multiple ordered lists. 
If elements of a set of intervals are arranged by employing 
virtual intervals as described below, the worst case execution 25 
time may be Oflj^^^log n). 

A search of the list of non-overlapping intervals at level 
L„ for example, yields a result of the point D, where D is in 
an interval ID/. A search of the lists at the next level L, +1 is 
performed, instead of searching through the remaining inter- 30 
vals at level L,-. In general, the result of the previous search 
at level L t - may be used for the search at level L (V1 , and the 
search at level L 1+J is performed for only those intervals that 
fall in the range of intervals ID lV / in level L J+1 given by the 
interval ID/ at L;. For this case, since each level ai ievel L <+1 
there may be 0(n/ls) intervals which fall within the range 
determined by ID/. Hence, an 0(log(n/ls))=0(log n) search 
may be needed at every level. 

Consequently, virtual intervals at levels L f ^L toJMAAr are 
defined in the following manner. The number of intervals N, 
is defined at level L { . Boundary points that demarcate the N, 
intervals in the d dimension at level L, are denoted by y/, 
y 2 * . . . with a maximum of 2N ; such points. Every other 
point at level L { is replicated at level L c _ 17 and up to 2N, 
points are so propagated to level L^. Although the present 
embodiment is described using propagation of every other 
point, other embodiments may skip NS points, NS an integer 
greater than 1, or may vary the number of points skipped 
according to granularity of the pointers used. 

The points that were propagated together with the points 
defining original non-overlapping intervals ID/, now define 
intervals at level L,_j as new intervals VD,./. These inter- 
vals are stored as non-overlapping intervals at level L^,. 
Next, for all the intervals at level L,.! and their associated 55 
points, every other point is replicated and propagated as 
virtual points to level L,_ 2 - This propagation process is 
repeated until the root level L,-,, (i.e., LJ is reached. Note 
that the propagation process is employed to speed up the 
search; at each level, the filter-rule rectangles associated ^ 
with each non -overlapping interval are as described in the 
preprocessing algorithm described previously. Virtual inter- 
vals and points that result from propagation are desirably 
ignored for association of filter-rule rectangles with non- 
overlapping intervals. 

The propagation process increases memory space require- 
ments by a constant factor, and so the total memory space 



requirement is still 0(n). A maximum amount of virtual 
intervals created and corresponding maximum memory 
space is when N^^^^n, n being the number of filter rules, 
in which case the number of boundary points at level L^^^ 
is 2n. The extra memory space due to the propagations is 
then as given in equation (1) 
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Increasing the memory space by a constant factor, 
however, allows for searching of multiple lists (i.e. lists of 
non-overlapping intervals at each level) efficiently. A packet 
EP»(S, D) arrives at the packet filter and is processed by the 
classification algorithm with a filter-rule table organized in 
accordance with the alternative embodiment. A first level, 
i.e., list of non-overlapping intervals VD/ is searched as 
described previously with respect to the classification 
algorithm, taking 0(log n) time for the worst case. This 
search results in locating the given point D in an interval 
VD/ that may be a virtual interval propagated from the level 
Lj. With D localized to this interval ID/, a search in the next 
level Ljsearches in the range of intervals given by VD/. 
Because every other point has been propagated up from 
level La, only 2 intervals in VD 2 ' may fall within the interval 
VD/ to which D has been localized. Hence, the search at 
level Lj may be completed in 0(1) time. In general, in 
moving from level h, to level L i+1 , the propagation of 
intervals allows enough information gained in the search at 
level L, to be employed in the search at level L U1 is 0(1) 
time. Hence, the worst case execution time of the look-up 
algorithm of the alternative embodiment is 0(l jAfc4 ^+log n). 

FIG. 9A and 9B illustrate an example of an alternative 
embodiment of the packet filter employing virtual intervals 
to reduce search time of the classification algorithm. FIG. 
9A illustrates a trie structure employed to search prefix 
values of fourteen exemplary filler rules in ascending order 
of length. FIG. 9B shows creation of virtual intervals for 
levels of a portion of the trie structure shown in FIG. 9B. For 
the exemplary embodiment of FIG. 9A and FIG. 9B, Table 
1 provides a list of filter-rules with corresponding prefix 
values and lengths for source fields and destination field 
ranges. 



TABLE 1 


Filtcr-Rulc 


Source 




Destination range d 


Number 


Prefix Value 


Prefix length 


(lower bound, upper bound) 




11* 


2 


(0,15) 


2 


0* 


1 


(4,7) 


3 


00* 


2 


(12,15) 


A 


0* 




(12,15) 


5 


0* 




(8,15) 


6 


10* 




(8,15) 


7 


001* 




(8.15) 


8 


000* 
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Apacket EP with fields S»0010 and D-1101 arrives in the 
system. Referring to FIG. 9 A, a search of the trie structure 
900 (the trie search) in the s -dimension begins at the root 
level 901 (level 0) to determine if the source address 
(S=0xxx) begins with a 0(state 902) or a 1 (state 903). This 
is a search of the set of prefixes of length 1. The trie search 
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moves to the state 902 at level 1 corresponding to the prefix forwarding engine oi router, including an input line 1005 for 

Oxxx of length 1, Similarly, at level 2 the trie search receiving an incoming packet and a bi-directional CPU 

determines if the next bit of the source address (S=00xx) is interface line 1010 representing control and timing lines for 

a 0 (state 904) or a 1 (state 905). The trie search moves to purposes of illustration. The incoming packet is input to a 

the state 904 at level 2 corresponding to the prefix OOxx of 5 pipeline register 1025 for temporary storage and is also input 

length 2. Finally, at level 3 the trie search of a portion of the to each classification processor 1050. Classification proces- 

set of prefixes of length 3 determines if the next bit of the sor 1050 employs memory 1030 to identify a filter-rule to be 

source address (S«001x) is a 0 (state 908) or a 1 (state 909). applied to the incoming packet. Field processor 1035 

The trie search moves to the state 909 at level 3 correspond- updates fields of the packet stored in pipeline register 1025 

ing to the prefix OOlx of length 3. For searches of prefixes, based on the identified filter-rule to be applied to the 

only a portion of sets of prefixes are searched in the tries. incoming packet. The details of classification processor 

Consequently, states 903, 906 and 907 are not reached since 1050 are now described with reference to FIG. 11. 

the trie search moves from state 901 to state 902, to state FIG. 11 shows a classification processor 1050 that 

904. receives the incoming packet and stores field parameters, 

HG. 9B illustrates an example of virtual intervals and 15 C -8 V sou ?; e ^ dr r cs f and destination addresses S and D in a 

point propagation to reduce search time of the classification register 1176. Under the control of filter processor 1160 

algorithm. First, non-overlapping intervals in the °P tlonal "5™* coat ™ 1 f^ lct t l \* S > and ™5 iate . d 

d-dimcnsion are shown for selected states at each level. For ™? 0Ty ^ ""^ ° f T *& nth *. 15 

, ,i . * . . aM , . re performed whereby non-overlapping interval information 

example, at level 1, state 902 corresponds to the prefix of m m ^ ided ^ u?9 for ^ 

ength 1 being Oxxx. The filter-rales of this prefix Oxxx 20 fij£ { ^ Comparator 1180 performs a comparison to 

(from Table 1) are rules 2, 4 and 5 with respective filter-rule mttaia each htopr *L associated with the D value of the 

segments (decimal ranges in the d-dimension) of (4,7), recc i vc d packet. After the correct solution for a filter-rule 

(8,12) and (8,15). These filter-rule segments are then decom- rectangle is found, its corresponding bitmap vector contain- 

posed into non-overlapping intervals (4,7), (8,12) and (12, mg potential filter-rale actions is provided from register 

15). Without virtual intervals, the trie search at level 1 ^ U79 along line 1190. From the resultant bitmap vector, the 

searches these three intervals to find the value D-1101 (i.e., CPU will apply the rule of highest priority, and performs the 

13 decimal) included in the third non-overlapping interval action dictated by the filter rule upon the received packet 

(12,15) associated with rule 5. However, for the next level stored in the pipeline register 1025. Thus, the packet may be 

2, the information of this search is lost. dropped or forwarded to another destination on output line 

Referring to FIG. 9B, the non-overlapping intervals of the 30 1015. 

highest level, level 3, are shown for the states 908 and 909. The preprocessing algorithm of the present invention may 

Points of these original, non-overlapping intervals at level 3 De implemented in the classification processor by filter-rule 

are propagated to the non-overlapping intervals at level 2. processing and table processing modules. The filter-rule 

Brackets in FIG. 9B indicate original, non-overlapping processing module may assign filter-rules to prefix values 

intervals. For the e-vamnte.shmvn. alternate points of the 35 and lcn ^ m one dimension, project the filter-rule seg- 

intcrvals of the left state 908 (next bit 0) and right state 909 mciste ^ the-other dimensmn^and decompose the: filter-rule 

(next bit 1) are inserted into the non-overlappmginlervals of se * men } s mt ° non-overlapping "tervate. The table- 

\ t , ' Ct , , - , . , j " . . t , processing module may be employed to coordinate memory 

the states of the next level 2, but as described previously the £ m2at £ D and storage ge the necessary pomte / s 

present mvention ^ not so ^limited . For example virtual J {h non . overla p ping 1 nt e%als fo?partiailar prefix value 

intervals (03), (3,4), (5,6), (6,9), (9,11), (11,12), (12,13) and 40 addrcssin schemes 

C h r T Caled , from thc 0riginal non-overlapping inter- ^ k ffl organizalion for the system is illus- 

val (12,15). Next, the alternate points of the intervals of state tratcd in RQ R which dcpicts a fl^.^ tablc having a 

904 are propagated to level 1, and as shown, propagated p i ur aUty of interval lists in one dimension corresponding to 

points, such as 12, may be duplicated in a level, since each prefix Iength of anom er dimension, which may be 

pointers are to be associated with the intervals. Normally, 45 associated with the following respective filter parameters: 1) 

points of left and right states are propagated, but for the destination addresses, and 2) source address. Entries of the 

example of FIG. 9A and FIG. 9B, no rules or intervals are filter-rule table are generated as described previously, i.e., 

associated with state 905. with respect to FIGS. 6 and 7, and addressed by prefix values 

As the trie search of prefixes as shown in FIG. 9A 1259a-1259rf. Each filter-rule tabic is shown to include an 

progresses, the search of intervals is as shown in FIG. 9B. 50 array 1260a-1260d of intervals to be searched correspond- 

At level 1, state 902, the intervals in thc d-dimension are ing to prefix values as described above with reference to 

searched and the value of D=1101, 13 decimal, is deter- FIG. 8, and the corresponding filter actions 1261a-1261d 

mined to be included in the interval (12, 12,15). At level 2, and the pointers 1262^-1262 d. 

after the prefix search moves to state 904, thc pointer While embodiments of the present invention are shown 

associated with propagated point 12 in interval (12,12,15) is 5s and described with respect to searches in a given dimension 

employed to limit the search in level 2 to interval (12,13,15). ordered from shortest to longest length, as would be appar- 

At level 3, after thc prefix search moves to state 909, the ent to one skilled in the art the present search algorithms 

pointer associated with propagated point 13 in interval and/or filter-rule table structures may be varied. For 

(12,13,15) is employed to limit thc search in level 3 to example, the search may be from the longest to the shortest 

interval (12,13), associated with rule 13 of Table 1. 60 prefix length, or from initial to final prefix values in an 

As described, the algorithm for computing the filters is ordered list of the set of prefix values. Further, matching of 

largely implemented in hardware and may be manufactured packets field values with prefix values and interval values 

in application specific integrated circuit (ASIC) form, or as are described herein using binary search techniques, but the 

a field programmable gate array (FPGA) that consequently, present invention is not so limited. As would be apparent to 

may operate at very high speed. FIG. 10 illustrates the 65 one skilled in the art, other search techniques to match 

hardware system 1000 for implementation of thc packet values may be employed, such as employing a perfect hash 

filter in accordance with the present invention in a packet method. 
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It will be further understood that various changes in the 
details, materials, and arrangements of the parts which have 
been described and illustrated in order to explain the nature 
of this invention may be made by those skilled in the art 
without departing from the principle and scope of the 5 
invention as expressed in the following claims. 

What is claimed is: 

1. Apparatus for associating at least one filter rule with a 
packet, each filter rule and the packet characterized by 
values in first and second dimensions, the filter rule to be 10 
applied to the packet by a router in a communications 
network, the apparatus comprising: 
Nf^torage.mj^ium;adapted to store'a'filtef^ lejta^ie, each 

entry of the filter-rule table corresponding to a prefix 
value having a length in the first dimension and at least 15 
one interval in the second dimension; and 
a classification processor comprising: 
a comparator adapted to identify each prefix value 
matching the value of the packet in the first 
dimension, and 20 
a filter processor adapted to retrieve, from the filter-rule 
table, each interval associated with each prefix value 
identified by the comparator containing the value of 
the packet in the second dimension, 
wherein the filter processor identifies as a solution ^ 
interval the interval associated with the prefix length 
characterized by an associated predetermined metric 
and containing the second field, and 
wherein the classification processor associates the filter 
rule corresponding to the solution interval with the 
packet. 30 

2. The invention as recited in claim 1, wherein the 
classification processor further comprises a pre-processor 
including: 

a filter-rule processing module adapted to: 

assign each filier-mle to one cr mere prefix values 35 
based on the values in the first dimension, 

project, for each prefix value having the same length, 
values of each corresponding filter rule of the prefix 
value onto the second dimension to define at least 
one filter-rule segment, and 40 

decompose each filter-rule segment into one or more 
non-overlapping intervals associated with each pre- 
fix value of the same length in the second dimension; 
and 

a table-processing module adapted lo generate a pointer 45 
for each corresponding non-overlapping interval to 
identify an included filter-rule, the table-processing 
module adapted to store the pointer as an entry of the 
filter-rule table associated with a prefix value length 
and a non-overlapping interval. 50 

3. The invention as recited in claim 2, wherein: 
the filter-rule processing module further comprises: 

assigning means for assigning each prefix value of the 
same length to a corresponding level; 

first projecting means for projecting, for the level 55 
having prefix values of a first length, values of each 
corresponding filter rule onto the second dimension 
to define at least one filter-rule segment; 

second projecting means for projecting, in each level 
beginning at the level having prefix values having a 
second length, 1) values of each corresponding filler 60 
rule onto the second dimension to define at least one 
filter- rule segment in a current level, and 2) selected 
points of the at least one non-overlapping interval in 
the previous level so as to define at least one virtual 
interval in the second dimension; and &s 

interval forming means for forming each filter-rule 
segment and each virtual interval of the current level 
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into one or more non-overlapping intervals associ- 
ated with each prefix value having the same length. 

4. The invention as recited in claim 3, wherein the first and 
second lengths are either 1) the longest and next longest 
lengths in a descending prefix length order, respectively, or 
2) the shortest and next shortest lengths in an ascending 
prefix length order, respectively. 

5. The invention as recited in claim 3, wherein the second 
projecting means projects, as selected points, every Nth 
point that defines either a start point or a stop point of each 
non -overlapping interval in the previous level, N an integer 
greater than 1. 

6. The invention as recited in claim 2, wherein the values 
of each filter rule in the second dimension are at least one 
range being a power of 2, each range being projected as a 
corresponding filter-rule segment to form the noo- 
overlapping interval in the second dimension. 

7. The invention as recited in claim 1, wherein the values 
of each filter rule are field ranges, the field ranges in the first 
dimension being a power of two, and each prefix length 
defines a number of specified bits of the field range. 

8. The invention as recited in claim 1, wherein ao entry of 
the filter-rule table of the storage medium includes a pointer 
identifying at least one filter rule contained in the corre- 
sponding non-overlapping overlapping interval. 

9. The invention as recited in claim 8, wherein each 
filter-rule has an associated priority, and the pointer identi- 
fies the filter- rule with the highest associated priority con- 
tained in the corresponding non-overlapping interval. 

10. The invention as recited in claim 8, wherein the values 
of each filter rule are field ranges, the field ranges in the first 
dimension being a power of two, and each prefix length 
defines a number of specified bits of the field range. 

11. The method as recited in claim 1, wherein the asso- 
ciated predetermined metric is either the prefix value having 
the longest prefix length, the shortest prefix length or the 
prefix length having a highest priority. 

12. A method of associating at least one filter rule with a 
packet, each filter rule and the packet characterized by 
values in first and second dimensions, the filter rule to be 
applied to the packet by a router in a communications 
network, the method comprising the steps of: 

a) providing a filter-rule table, each entry of the filter-rule 
table corresponding to a prefix value having a length in 
the first dimension and at least one interval in the 
second dimension; 

b) identifying each prefix value matching the value of the 
packet in the first dimension; 

c) retrieving, from the filter-rule table, each interval 
associated with each prefix value identified in step b) 
containing the value of the packet in the second dimen- 
sion; 

d) identifying, as a solution interval, the interval associ- 
ated with the prefix value characterized by an associ- 
ated predetermined metric and containing the value of 
the packet in the second dimension; and 

e) associating the filter rule corresponding to the solution 
interval with the packet. 

13. The method as recited in claim 12, wherein the step a) 
comprises the steps of: 

f) assigning each filter-rule to one or more prefix values 
based on the values in the first dimension; 

g) projecting, for each prefix value having the same 
length, values of each corresponding filter rule of the 
prefix value onto the second dimension to define at 
least one filter-rule segment; 

h) decomposing each filter-rule segment into one or more 
non-overlapping intervals associated with each prefix 
value of the same length in the second dimension; 
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i) generating a pointer for each corresponding non- 
overlapping interval to identify an included filter-rule; 
and 

j) storing the pointer as an entry of the filter-rule table 
associated with a prefix value length and a non- 
overlapping interval. 

14. The method as recited in claim 13, wherein: 
step g) further comprises the steps of: 

gl) assigning each prefix value of the same length to a 
corresponding level; 

g2) projecting, for the level having prefix values having 
a first length, values of each corresponding filter rule 
onto the second dimension to define at least one 
filter-rule segment, 

g3) projecting, in each level beginning at the level 
having prefix values having a second length, 1) 
values of each corresponding filter rule onto the 
second dimension to define at least one filter-rule 
segment in a current level, and 2) selected points of 
the at least one non-overlapping interval in the 
previous level so as to define at least one virtual 
interval in the second dimension; and 
step h) further comprises the step of: 

hi) forming each filter-rule segment and each virtual 
interval of the current level into one or more non- 
overlapping intervals associated with each prefix 
value having the same length. 

15. The method as recited in claim 14, wherein, for steps 
g2) and g3), the first and second lengths are either 1) the 
longest and next longest lengths in a descending prefix 
length order, respectively, or 2) the shortest and next shortest 
lengths in an ascending prefix length order, respectively. 

16. The method as recited in claim 14, wherein step g3) 30 
projects, as selected points, every Nlh point that defines 
either a start point or a stop point of each corresponding 
non-overlapping interval in the previous level, N an integer 
greater than 1. 

17. The-mc thud-as recited in cliim 13,-whereLn -'he values 3 s 
of each filter rule in the second dimension are at least one 
range being a power of 2, the projecting step g) projects each 
range as a corresponding filter-rule segment in the second 
dimension, and the decomposing step h) forms the non- 
overlapping interval from the corresponding filter-rule seg- 
ment projected in step g). 

18. The method as recited in claim 12, wherein the values 
of each filter rule are field ranges, the field ranges in the first 
dimension being a power of two, and each prefix length 
defines a number of specified bits of the field range. 

19. The method as recited in claim 12, wherein, for the 
filter-rule table provided in step a), an entry of the filter-rule 
table associated with a prefix value length and a non- 
overlapping interval includes a pointer identifying at least 
one filter rule contained in the corresponding non- 
overlapping interval. 

20. The method as recited in claim 19, wherein each 
filter-rule has an associated priority, and the pointer gener- 
ated in step i) identifies the filter-rule with the highest 
associated priority contained in the corresponding non- 
overlapping interval. 

21. The method as recited in claim 19, wherein the values 
of each filter rule are field ranges, the field ranges in the first 
dimension being a power of two, and each prefix length 
defines a number of specified bits of the field range. 

22. The method as recited in claim 12, wherein for step d) 
the associated predetermined metric is either the prefix value 
having the longest prefix length, the shortest prefix length or 
the prefix length having a highest priority. 

23. A method of storing at least one filter rule with values 
associated with first and second dimensions in a filter-rule 
table comprising the steps of: 



a) assigning each filter-rule to one or more prefix lengths 
based on the values in the first dimension; 

b) projecting, for each prefix length, values of each 
corresponding filter rule of the prefix length onto the 
second dimension to define at least one filter-rule 
segment, 

c) decomposing each filter-rule segment into one or more 
non-overlapping intervals associated with each prefix 
length and corresponding filter rule in the second 
dimension; 

d) generating a pointer for each corresponding non- 
overlapping interval to identify an included filter-rule; 
and 

e) storing the pointer as an entry of the filter-rule table 
associated with a prefix length and a non-overlapping 
interval. 

24. The method as recited in claim 23, wherein: 
step b) further comprises the steps of: 

bl) assigning each prefix value of the same length to a 

corresponding level; 
b2) projecting, for the level having prefix values of a 
first length, values of each corresponding filter rule 
onto the second dimension to define at least one 
filter-rule segment, 
b3) projecting, in each level beginning at the level 
having prefix values having a second length, i) 
values of each corresponding filter rule onto the 
second dimension to define at least one filter-rule 
segment in a current level, and if) selected points of 
the at least one non-overlapping interval in the 
previous level so as to define at least one virtual 
interval in the second dimension; and 
step c) further comprises the step of: 
cl) forming each filter-rule segment and each virtual 
interval of the current level into one or more non- 
overlapping intervals associated with each prefix 
value having the same length. 

25. The method as recited in claim 24, wherein, for steps 
40 b2) and b3), the first and second lengths are either 1) the 

longest and next longest lengths in a descending prefix 
length order, respectively, or 2) the shortest and next shortest 
lengths in an ascending prefix length order, respectively. 

26. The method as recited in claim 24, wherein step b3) 
45 projects, as selected points, every Nth point that defines 

either a start point or a stop point of each corresponding 
non-overlapping interval in the previous level. 

27. The method as recited in claim 23, wherein the values 
of each filter rule are field ranges, the field ranges in the first 
dimension being a power of two, and each prefix length 
defines a number of specified bits of the field range. 

28. The method as recited in claim 23, wherein each 
pointer stored in the filler-rule table in step e) identifies each 
filter rule contained in the non-overlapping interval. 

29. The method as recited in claim 23, wherein each 
pointer stored in the filter-rule table in step e) identifies the 
filter-rule with the highest associated priority contained in 
the corresponding non-overlapping interval. 

30. The method as recited in claim 23, wherein the values 
of each filter rule in the second dimension are at least one 
range being a power of 2, the projecting step b) projects each 
range as a corresponding filter-rule segment in the second 
dimension, and the decomposing step c) forms the non- 
overlapping interval from the corresponding filter-rule seg- 
ment projected in step b). 
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ABSTRACT 



A packet network employing a reservation-based protocol 
system includes routers having processing sections that 
schedule message processing of the protocol's control mes- 
sages adaptively based on link utilization. A scheduler of the 
processing section employs a round-robin scheduling with 
adaptive weight assignment to allocate processing capacity 
for control messages. For the RSVP protocol, for example, 
messages are grouped in classes, and link utilization of the 
packet flows for each message class is monitored. Weights 
corresponding to a portion of the processing section's pro- 
cessing capacity are allocated to each message class. The 
weights are defined based on link utilization for the message 
class and average message queue length. For processing 
sections monitoring multiple links, weights are further 
defined for super-classes based on overall link utilization. 
Weights may change as link utilization and average message 
size changes. With defined weights adaptively defined, the 
processing section then processes each message class in a 
cyclic, "rcund-rcbin" fashion. 

15 Claims, 6 Drawing Sheets 
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ADAPTIVE PROCESSOR SCHEDULOR AND router, and destination, resets the refresh timer. Periodic 

METHOD FOR RESERVATION PROTOCOL UPDATE messages are generated by the source and received 

MESSAGE PROCESSING by each router. Each router, upon processing of UPDATE 

messages, resets its refresh timer and propagates the 

CROSS-REFERENCE TO RELATED 5 UPDATE message. A packet flow is terminated, the con- 

APPLICATIONS nection torn-down, and the reserved resources released by an 

intermediate router (or the destination) when either an 

This application claims the benefit of the filing date of explicit TEAR -DOWN message generated by the source or 

U.S. provisional application No. 60/086,246, filed on May destination is received, or when a refresh timer expires. 

21,1998. 10 RSVP facilitates exchange of resource reservation infor- 
mation among routers in the packet network and is a 

BACKGROUND OF THE INVENTION soft-state protocol which relies upon periodic refresh mes- 

1. Field of the Invention sa 8 e ^quests (UPDATE messages) to maintain router state 
' A . . , . . t , . , information. Refresh messages that are not sent or processed 

Tne present invention relates to packet networks, and, d ^ ^ cstablished ket fl * w t0 te 

more particularly, to scheduling of control protocol message terminated periodic refresh messages, and consequent 

processing by a router. soft ^ information in the f0Uters> permit me Rsvp 

2. Description of the Related Art protocol to operate robustly in the presence of packet flow 
Packet networks, such Internet Protocol (IP) based route changes and lost signaling messages, without requiring 

^networks, are increasingly providing differentiated services. ^ explicit messages to terminate the packet flow. Soft state 

£6ne;ap proach:for pxo^mngjdiffcren protocols, such as RSVP, allow packet networks to provide 

"^LZtyp^pf rseryice- ( TPS ) .bits'- defir ie"d ^in,mer pacl«t^hTader. services comparable to those in virtual circuit networks with 

t^TZ Routers with'inlhe . packet networ ^interpret, t he TQS j> itsjn explicit connection establishment and termination. 

aipredejamme d-manne^ However, the processing section of each router must 
^ SjEscryiccs. Another approach, which is a reservation-based ^ process the periodic refresh messages. Router processing 
approach, employs control messages to reserve network load increases with the number of established RSVP packet 
resources for the duration of a connection defined by a flows passing through the router, even if these RSVP packet 
packet flow, or packet flow aggregates ("flows"). For this fl ows are no t actively sending packets. The RSVP message 
reservation-based approach, a protocol that may be load offered to the processing seaion comprises i) message 
employed to signal reservation of network resources is the 30 requests due to RSVP reservation connection establishment 
Reservation Setup Protocol (RSVP). RSVP, as an example, anc i termination and ii) message requests due to refresh 
may be used in conjunction with service models, such as messages generated by established RSVP packet flows, 
guaranteed rate and controlled load service models, to E ven though refresh messages consume a relatively small 
request a desired quality of service (QOS) for certain packet capacity of the processing section, the offered processor load 
" 0W5 - is due to refresh messages increases as the number of 
RSVP is a receiver-oriented resource reservation proto- in-progress RSVP packet flows increases iluoagh the router, 
col: reservations are initiated when a source (sender) Even if an "adequate" control processor with capacity 
requests a resource reservation, such as a reservation for a determined by traffic engineering rules is employed in the 
certain amount of bandwidth of a transmission line or logical processing section, temporary overloading of the control 
link during connection set-up or during an established 40 processor may occur. Such temporary overloading may 
connection. This RSVP request is signaled through the result from a "mass call-in" that generates a relatively large 
network using a PATH message. The PATH message is number of new reservation message requests for connection 
routed along the network to its destination (or set of establishment or termination within a relatively short period 
destinations) through a series of routers in a similar manner of time. In addition, the control processor does not neces- 
to that of other IP packets. However, before propagating a 45 sarily process RSVP message requests alone, but may also 
PATH message, each router checks if sufficient requested handle other routing tasks. With a burst of routing 
resources are available. If the requested resources are instabilities, and recalculations of network routes, for 
available, the router first establishes a flow-slate for the example, these routing tasks may require considerable por- 
packet flow (or aggregated flows) indicated by this request lion of available processing capacity, causing a bottleneck in 
and then propagates the PATH message. The packet flow in 50 processing of RSVP message requests. Consequently, res- 
progress is maintained by periodic UPDATE messages gen- ervation blocking is possible even though link capacity may 
erated by the source. Also, each intermediate router, and the be available. 

destination, starts a counter, or refresh timer, that is Appropriate scheduling of message processing by the 
employed to generate a processor interrupt causing termi- processing section may be used to maximally utilize the link 
nation of the packet flow if no RES V or UPDATE messages 55 capacity within the constraints of the available processing 
are received for the packet flow before timer expires. resources. For example, consider the case when the link- 
When the PATH message reaches the desired destination utilization is relatively high. Processing PATH or RESV 
(recipient), the destination sends back a RESV message messages before processing TEAR -DOWN messages is not 
through the network to the source. The RESV message may advantageous since available bandwidth is unlikely to sat- 
be, for example, a bandwidth request that may be different 60 isfy these new requests. Therefore, for the high-link utili- 
from the bandwidth requested in the PATH message. When zation case, processing TEAR-DOWN messages in the 
an intermediate router of the network receives a RESV router's queue before processing PATH messages is benefi- 
message for which there is an established packet flow for a cial. However, routers of the prior art employ scheduling 
connection, the intermediate router commits the requested that is not link sate dependant, such as First In First Out 
bandwidth to the packet flow. From the point-of-view of this 65 (FIFO) processing. Similarly, for the case when link- 
intermediate router, the packet flow is now in progress. Each utilization is relatively low, processing of TEAR-DOWN 
PATH and RESV message received by the intermediate messages may be deferred, allowing scarce processing 
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resources to process new PATH and RESV messages. 103, 104 and 105 and destinations 108 and 109 employ 
However, this deference is similarly not adopted by FIFO scheduling of processing capacity of corresponding process- 
scheduling. Furthermore, giving priority to UPDATE mes- ing sections allocated to message requests of the rcscrvation- 
sage processing is desirable since deferment of UPDATE based protocol. For the following description of exemplary 
message processing may result in expiration of the refresh 5 embodiments, the packet network 100 employs the 
timer, and so terminate the packet flow, in the router or in reservation-based protocol RSVP, although the present 
downstream routers. invention is not so limited. 

A connection that establishes packet flows between source 

SUMMARY OF THE INVENTION 1^2 and at least one destination 108 and 109 may be set up 

10 by message requests in the following manner. The source 

The present invention relates to allocation of processing 102 desires lo establish a packet flow with, for example, 

capacity to processing control messages of a router in a destination 108. The source 102 generates a PATH message 

packet network. A link utilization value of a link coupled to requesting, for example, a connection having a specified 

the router is monitored, and a message request size and a amount of bandwidth. The PATH message is routed through 

corresponding weight for at least one class of control the network by routers 104 and 105, for example, to desti- 

messages are calculated. Each weight is calculated based on nation 108. Before propagating the PATH message, each 

the link utilization value and the message request size of router 104 and 105 checks if sufficient requested bandwidth 

each class, A portion of the processing capacity of the router resources are available. If the requested resources are 

is allocated for each class of control messages based on the available, router 104 first establishes a soft state for the 

corresponding weight of the class. ^ packet flow indicated by this request and then propagates the 

PATH message to router 105. Routers 104 and 105 start 

BRIEF DESCRIPTION OF THE DRAWINGS respective refresh timers for the packet flows that cause 

Other aspects, features, and advantages of the present termination of the packet flow when no RESV or UPDATE 

invention will become more fully apparent from the follow- messages are received before the refresh timer expires, 

ing detailed description, the appended claims, and the 25 Whcn mc PATH mcssa fi c reaches dcstmation 108, desti- 

accompanying drawings in which' nation 108 sends back a RESV message through the routers 

FIG. 1 shows block diagram of a packet network employ- WS } ™ d J 04 t0 thc * omcc 102 ' **** message may 

ing a round-robin scheduling method with adaptive, weight- mcludc >. f°' ^mple * message request for bandwidth 

■ . • , ° ... * ■ reservation that may be different from the bandwidth reser- 

mg assignment in accordance with the present invention; . J t . . „„ 

-, n vation message request ot the PA1H message. When routers 

FIG. 2 shows a block diagram of an exemplary embodi- m and 105 receiv£ me R£SV message> eacb router 104 and 

ment of a processing section of a router employing a 105 mmiaiis me reqU ested bandwidth to the packet flow, if 

scheduling method in accordance with the present invention; available . 0nce ^ RESV message reaches source 102, the 

FIG. 3 shows a flow chart of an algorithm implementing connection is established. Eacb PATH and RESV message 
a mund-robin scheduling method with adaptive weighting 35 received by routers 104 and 105 resets the corresponding 

assignment employed by a scheduler of FIG. 2; refresh timer. The cuimectiGii is maintained by periodic 

FIG. 4A shows a number of flows for relatively low UPDATE messages generated by the source 102 and desti- 

processor load without router refresh timer expiration in nation 108. Each router 104 and 105, upon processing of an 

accordance with a scheduling algorithm of the prior art; UPDATE message, resets the corresponding refresh timer 

FIG. 4B shows a number of flows for relatively high 40 and propagates the UPDATE message. The connection may 

processor load without router refresh timer expiration in either be terminated when an explicit TEAR-DOWN mes- 

accordance with a scheduling algorithm of the prior art; sa S e generated by the source 102 or destination 108 is 

FIG. 5A shows a number of flows for relatively low received ' or when the refresh amer of me router 104 or 105 

processor load with router refresh timer expiration in accor- eX £!!l S ' , , , . ,. , „„„ 
dance with a scheduling algorithm of the prior art; 45 FIG. 2 shows a block diagram of a processmg section 200 

_„ , , „ „ a of a router, such as routers 103-105. employing a round- 

FIG. SB shows a number of flows for relatively high ■■ l j r *u j -.u j r - u*- 

, , . , c . . . . . °^ robm scheduling method with adaptive weighting assign- 
processor load with router refresh timer expiration in accor- . • . ... ., . • .. 
j . , 1 c .l • ment in accordance with the present mvention. The process- 
dance with a scheduling algorithm of the prior art; -.^ ■ , , » n 

° b r 7 ^ sec t lon 200 includes controller 202, message-routing 

FIG. 6 shows simulation results of a moderately loaded 5Q processor 204, scheduler 206 havmg timing section 208, and 
processor employing an adaptive scheduling method in packet classifier 210. Further included is the input queue 212 
accordance with an exemplary embodiment of the present having receive queues 220 for each transmission line ter- 
invention; and minated by the router, and output buffer 214 having transmit 

FIG. 7 shows simulation results for the system of FIG. 6 buffers 222 for each output transmission line of the router, 
modified to allocate a relatively high portion of processing ss Each transmission line supports message or other logical 
resources to route processing. traffic for one or more connections, which support may be 

defined as a link. 

DETAILED DESCRIPTION ^---^a^t-da ^cr^l Olif ^ 
FIG. 1 shows block diagram of a packet network lOO'c^ernployedfor pr ocessin g ofcontrol or other system-signaling 
employing a round-robin scheduling method with adaptive 6oCtyjjc\messageX~Pai^tx3assifi^^ 

weighting assignment in accordance with the present inven-—~a packet ^classifier processing module ^at a l so classi fies data 
lion. The packet network 100 includes a source 102 in crpacketsifo f traffic rouj iag«purposesrAs dwenbfid- below, the 
communication with destinations 108 and 109 through rout- ,_terjm "packct-may-indicate a contiofrnTisag^but may^also 

ers 103, 104 and 105. The packet network 100 employs a be-a-portion of a'control messag e, since control messages 

reservation-based protocol, such as RSVP, to allocate packets ~may_be.fonned.from^ 

flows from the source 102 to one or more of the destinations Packets_r«e^edJ)yJhje_j^t£r3re^tor^ in in put queue 
108 and 109 through routers 103, 104, and 105. The roulers^IT212..Packet claisifier-210 monitors each packet oftf^input 
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eue 212, jmplying, f or ^xample t -a-packet-filte r -to-each - 
< RaclKtTtoTd^c'rmiD^tKc~tvDc^fVack^r .to identify control^ 
^messages: PATH, RESV, UPDATE and TEAR-DOWN 
messages, for example, are identified as control messages by 
the packet classifier 210 and reported to the scheduler 206 
and controller 202. Controller 202 processes each control 
message based on, for example,, hoarier information _and 
or/message type to Hetermiqe how tojjrocess the contro l 
jnessage to establish, maintain or tear-down a connection. 
Message-routing processor 204, in accordance with signals 
of controller 202, transfers packets stored in the input queue 
212 to a corresponding output transmission line. 

Scheduler 206, upon receiving notice of a PATH, RESV, 
UPDATE and TEAR-DOWN message from packet classi- 
fier 210, begins a corresponding counter of timing section 
208. This counter may be provided as, for example, a refresh 
timer for a corresponding packet flow. If the counter of 
timing section 208 expires, an action, such as dropping the 
corresponding packet of the input queue 220, is performed. 
The scheduler 206 also allocates processing capacity of 
controller 202 with a round-robin scheduling method with 
adaptive weighting assignment in accordance with the 
present invention. Briefly, the PATH, RESV, UPDATE and 
TEAR-DOWN messages are each a priori assigned to a 
class. Scheduler 206 then allocates portions of the process- 
ing capacity of the processing section 200 to each of these 
classes based on link utilization. The PATH, RESV, 
UPDATE and TEAR-DOWN messages are processed by 
controller 202 in accordance with the allocated processing 
capacity and, hence, subsequently routed by message- 
routing processor 204. 

The weights are calculated by a processor, which may be 
a processor of the controller 202 or of the scheduler 206. For 
convenience, the processor of the scheduler 206 of the 
exemplary embodiment calculates the weights at predeter- 
mined intervals in lime and also allocates processing 
capacity, although the present invention is not so limited. 
The processor may also calculate average message request 
size, which may be defined as the average requested quality 
of service (QoS) metric. The QoS metric may be an average 
reserved bandwidth requested by the RSVP signaling 
messages, but other QoS metrics may be employed that may 
be related to link bandwidth. For example, minimum 
bandwidth, transmission delay, or probability of lost packet. 
The method by which weights are calculated is described 
subsequently with respect to FIG. 3. 

To allocate portions of processing capacity for message 
processing by the controller 202, a monitoring circuit to 
monitor link utilization may be included in the router. Such 
monitoring circuit may be included in the controller 202, 
scheduler 206 or input queue 212, However, the function of 
this circuit may be distributed in whole or in part in other 
circuitry within the router (e.g., in transmission line termi- 
nation cards not shown in FIG. 2). For the exemplary 
embodiment of FIG. 2, the monitoring circuit may be 
included in the scheduler 206. The scheduler 206 desirably 
monitors link utilization as, for example, traffic, the fraction 
of the link capacity in use. Such monitoring may include, for 
example, determining an average number of PATH, RESV, 
UPDATE and TEAR -DOWN messages received, number of 
established connections, average packet length, or average 
time in the receive queue 220. Alternatively, message- 
routing processor 204 and controller 202 may monitor link 
utilization of each transmission line. 

The round-robin scheduling method with adaptive 
weighting assignment as applied to the PATH, RESV, 
UPDATE and TEAR- DOWN message processing is now 
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described. To simplify the following description of the 
preferred embodiment, refresh timers are set to fixed values 
that arc at least an order of magnitude larger than typical 
round-trip times. As would be apparent to one skilled in the 

s art, however, time adaptation mechanisms may be employed 
for the refresh timers. An initial model for the processing 
section 200 of each router 103-105 may be as follows. The 
processing section 200 services each logical interface of a 
transmission line. RSVP signaling message requests arriving 

10 over the logical interface are provided to the processing 
section 200 and placed in input queue 212 for further 
processing. A service discipline for the queue is FIFO with 
no distinctions made between the different message types 
except for weighted scheduling of the RSVP signaling 

15 message requests in accordance with the present invention. 
Instead of FIFO processing of RSVP messages, weighted 
scheduling in accordance with the present invention pro- 
cesses each message type with assigned portions of process- 
ing capacity based on a priori traffic statistics, or link 

20 utilization. Weighted scheduling of the processing section 
200 may be defined as an allocation of predetermined 
amount, or percentage, (the "weight") of overall processing 
capacity of the processing section 200 to a message class. 
This processing capacity may be the processing capacity of 

25 controller 202, but other schemes may be employed. For 
example, signaling message requests may be classified into 
just three classes: PATH & RESV messages, UPDATE 
messages and TEAR-DOWN messages. Messages of each 
class are weighted by allocating the processing capacity 

30 ( c -g > percent of processing time of the processor section) to 
process messages of each class in FIFO manner. "Round- 
robin" may be defined as switching the processing by the 
controller between the classes (i.e., switching the message 
processing) in a predetermined, cyclic order. 

35 An immediate disadvantage of using fixed weights is the 
difficulty of "choosing" an appropriate weight for processing 
of UPDATE messages. The UPDATE message traffic 
increases in proportion to the number of RSVP packet flows 
already established. Moreover, if UPDATE messages are 

40 lost due to insufficient assigned weight, then existing flows 
may be unnecessarily torn down. Giving priority to 
UPDATE messages and round-robin scheduling with fixed 
weight processing amongst the other classes may improve 
performance. Round-robin scheduling with fixed weight 

45 processing, however, has further disadvantages. For 
example, if the message load (traffic) of UPDATE messages 
is very high due to a very large number of established packet 
flows, then the TEAR-DOWN messages may not be pro- 
cessed adequately. Hence, packet flows that should be torn 

50 down may last longer than necessary, increasing link utili- 
zation while preventing new packet flows from having 
reservation requests processed, and so connections estab- 
lished. Furthermore, the fixed- weight round-robin method 
does not account for link utilization, even though knowledge 

55 of link utilization may be used to increase the probability 
that message requests will be processed in a satisfactory 
manner. 

In accordance with the present invention, round-robin 
scheduling with adaptive weighting assignment employs 

60 knowledge of link utilization to increase performance of 
message processing of each link, under varying traffic con- 
ditions. For simplicity, scheduling for three classes of ser- 
vice is described, although the present invention is not so 
limited. PATH & RESV messages are assigned to a first 

65 class, UPDATE messages are assigned to a second class, and 
TEAR-DOWN messages are assigned to a third class. Cor- 
responding weights may be denoted as w^ for PATH & 
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RESV messages, for UPDATE messages, and w TD for w ut ^(% of U for established RSVP packet flows)+C (3) 
TEAR-DOWN messages. As described previously, when the 

counter of timing section 208 expires, an interrupt is gen- In «F ation ( 3 >» * fa a factor 11181 ""y »* "pe^^y optimized, 

erated for the controller 202 to terminate a respective packet and C is a constant to account foi arrival rates based dh queue length. Fixed 

flow. High priority may be assigned to these interrupts and 5 priority to UPDATE messages and round-robin scheduling with adaptive 

these interrupts do not queue as TEAR -DOWN messages. weighting assignment amongst the other classes may also be employed. 

A single level scheme of round-robin scheduling with As given by equations (1) and (2), a weight assigned to 

adaptive weighting assignment is now described for a case PATH and RESV messages is increased when U is small, 

when, for example, the input queue 212 employs a single The value for U may be small, for example, when reserva- 

reccivc queue 220 of a single link (FIG. 2). An average 1Q tion message request sizes are small and when the link 

reservation message request size, expressed in fraction of utilization is low. Therefore, for law utilization, a scheduling 

link capacity, for PATH & RESV messages is defined as method assigning weights in accordance with the present 

PR fl „ bits/sec. As defined herein, "size" may be the average invention increases the rate at which PATH and RESV 

size or amount of bandwidth, or other form of capacity mcsS ages are processed. This increase in rate occurs since 

(related to bandwidth) requested by the messages. The delaying processing of TEAR-DOWN messages does not 

average reservation message request size PR may be affcct Unk utilization . similarly, when the value for U is 

W ^ P ! lted / S S 0 T in ! h r e " tW ^ aD f^ 0 , n ? Dt 1 ial i ni00thing l"ge, the link utilization is very high and the scheduling 

model and with a torget factor a selected to track message ^ ^AR-DOWN messagC5 

at a higher rate, 

request size over time-scales of the order of a few hundred v , *\ . ,i_ f\i. . datu 

packet inter-arrival times. Similarly, an average termination J«£y decreasing the probability that the next PATH or 

(or "tear-down") message request size, expressed in fraction 20 Kh t t>v messa S c 15 blocked. 

of link capacity, for TEAR-DOWN messages is defined as In accordance with another exemplary embodiment of the 

TD awe . The value for TD_ may be similarly computed with P"**" 1 invention, processor scheduling for multiple links 

a corresponding forget factor a^. Link utilization, or uti- corresponding to multiple receive queues 220 of the input 

lized link capacity, is denoted as Ubits/sec, and n PR is queue 212 shown in FIG. 2 may be employed. Processor 

defined as (l-[U/PR ave D and n TD is defined as (\J/TD ave ). 25 scheduling for the case of multiple links employs a double- 

The weight w PR for processing PATH and RESV messages level hierarchical scheme of round-robin scheduling with 

is now calculated as in equation (1) adaptive weighting assignment instead of the single-level 

scheme such as described with respect to equations (l)-(3)- 

npg (i) At a high level, a super-class is defined for each link and 

WF * ~ {n pg +n TD ) 30 processing capacity allocated to each link based on the 

corresponding weight of the super-class At a low level for 

. .„ « . . „ . „ each super-class of a link, the same message class definitions 

Similarly, the weight w„> for processing TEAR DOWN ^ wei ^ tg afe Y d ffi in the sift g le .i eve i sc heme 

messages is calculated as m equation (2) described previously. 

35 Linksj-{1,2, i, . . . , n} (j, i, and n each an integer) are 

w associated with the prowssmg section 200. Weights for the 
lower classes associated with each super-class are calculated 
in the same manner as described above with respect to 

Although equations (1) and (2) are shown with processing equations (1) and (2). The weights for the low level class of 

time for the three message types being the same, as would ^ link i are denoted by vf PJt and w* TD . For each super-class of 

be apparent to one skilled in the art, processing time may be link i, the super-class weight W ( . is computed as in equation 

different in practice. However, the weights calculated from (4) 
equations (1) and (2) may be scaled proportionally to the 

processing times. wL + hL, (4) 

The weights that are calculated in the equations (1) and w, = — — — — — 

(2) do not include a factor for the arrival rates of packets into 45 V ^ + V ^ 
the input queue 212. If it is desired to account for the arrival 7 -i j-i 
rates to input queue 212, then average queue length may be 

employed as the factor. Weights may be computed by adding 

the average queue length to a PJt for the PATH & RESV The low level weights vf UD for UPDATE message pro- 
message class, and then by adding the average queue length 50 cessing of each link i are determined as described above with 
to n TD for the TEAR DOWN message class. Weights respect to equation (3). However, the low level class weights 
assigned to the classes are then computed as in equations (1) w*ud f° r UPDATE messages may also be calculated in a 
and (2). similar manner to that described for equations (1) and (2) 
The weight w^ for processing of UPDATE messages described previously. However, for the double-level scheme 
may be calculated in a similar manner to that given in 55 the update weights v^ UD should then be included in equation 
equations (1) and (2) by defining an average update message (4) for the super-class weight W,. 

size for UPDATE messages as UD flVW forget factor a. U£f , n^ For this exemplary embodiment employing a double-level 

as V/UD ave and modifying equations (1) and (2). However, scheme, when a link is highly utilized (experiencing a high 

a preferred embodiment may set the weight of UPDATE blocking), or a link is lightly utilized (wasting bandwidth 

message processing based on current link utilization of 60 which could be used if the processor were not a bottleneck) 

established packet flows. Since the UPDATE message traffic then one of the assigned lower-class weights of the link is 

increases in proportion to the number of RSVP packet flows high. If other links are not in these extreme situations, then 

already established, the weight w y for processing of the assigned weights of the other links are not as high. 

UPDATE messages may be adaptively varied based on the Hence, the super-class weights tend to give a higher share of 

number of established RSVP packet flows in progress for the 65 the processor to links which are at the extremes of utilization 

link. Therefore, the weight w^ may be as given in equation and have a backlog of messages to be processed. This 

(3) : weighting assignment enforces a fairness of processing 
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allocation between the links. Note that if one link has a large 
number of reservation message requests with small size 
while another link has a small number of reservation mes- 
sage requests with large size, then the former link gets a 
higher super-class weight. Also, link speeds may differ 
since, during weight computation, link speeds and message 
request sizes may be normalized. 

Alternative embodiments of the present invention may 
employ any number of modifications to the adaptive deter- 
mination of weights to further account for characteristics of 
the packet network. For example, queue content may be 
included and so this information may be employed to bias 
the scheduling method according to message classes with 
longer queues, or weights may be determined only accord- 
ing to the queue -length of each message class. Further, 
weights may be multiplied by the normalized work accu- 
mulated in the corresponding message queues of the receive 
queues 220. 

FIG. 3 shows a flow chart of algorithm for implementing 
the round-robin scheduling method with adaptive weighting 
assignment in accordance with the present invention 
employed by the scheduler 206 of a router. The flow chart as 
shown in FIG. 3 is exemplary only. As would be apparent to 
one skilled in the art, the basic steps may be augmented, or 
the steps separately implemented in two or more processing 
sections of the router. In addition, the flow chart of FIG. 3 
does not show the effect of expiration of a refresh timer, 
which effect may be implemented by employing an interrupt 
when the refresh timer expires for immediate processing of 
the connection termination. 

Referring to FIG. 3, first, at step 301, the scheduler 
algorithm determines whether weights for the classes should 
be updated. If so, the scheduler algorithm moves to step 310; 
otherwise, the scheduler algorithm moves to step 302 using, 
for exampierwciglii* for the sssignsd-cksses ..previously 
determined in step 311 (described subsequently). This test of 
step 301 may be employed in a manner such that the weight 
assignment method adaptively changes the weights over a 
reasonably short time, but also occurs relatively infrequently 
so as to not burden the controller or other processor of 
processing section 200. 

If, at step 301, the scheduler determines that the weights 
should be updated, then, at step 310, link utilization mea- 
surements are retrieved for the link or links, the individual 
classes of each link, and/or, if employed, the super-classes of 
the links. Next, at step 311, the weights are calculated in a 
manner similar to that of the exemplary calculations of 
equations (l)-{4), and then the algorithm moves from step 
311 to step 302. 

At step 302 the scheduler algorithm determines process- 
ing capacity allocated to processing messages for each of the 
assigned classes (e.g., PATH & RESV, UPDATE, and 
TEAR-DOWN messages) based on the corresponding 
weights. Then, the scheduler algorithm moves to step 303 to 
process messages of the first class. 

At step 303, the processing section processes messages of 
the first class in the receive queue 220 for a portion of 
allocated processing capacity based on the calculated weight 
for the first class. For example, PATH and RESV messages 
may be processed. The allocated portion may be a portion of 
the total processing capacity as measured in, for example, 
processor cycles, time, number of packets, or other measure 
of processing as known in the art. Once the allocated portion 
is exhausted, the scheduler algorithm moves to step 304. 

At step 304, the processing section processes messages of 
the second class in the receive queue 220 for a portion of 
allocated processing capacity based on the calculated weight 
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for the second class. For example, UPDATE messages may 
be processed. Once the allocated portion is exhausted, the 
scheduler algorithm moves to step 305. 

At step 305, the processing section processes messages of 

5 the third class in the receive queue 220 for a portion of 
allocated processing capacity based on the calculated weight 
for the third class. For example, TEAR-DOWN messages 
may be processed. Once the allocated portion is exhausted, 
the scheduler algorithm moves to step 306. 

10 At step 306, once the messages of the last class (e.g., 
TEAR-DOWN messages) are processed, the algorithm pro- 
cesses other messages or performs other types of packet 
network processing during the remaining portion of allo- 
cated processing capacity. Then, when the remaining portion 

15 is exhausted, the scheduling algorithm returns from step 306 
to step 301. 

Exemplary embodiments of the present invention may be 
simulated and compared with a simple and useful FIFO 
scheduling method of the prior art. For the exemplary 

20 simulations described below with respect to FIGS. 4A-7, a 
large number of sources and destinations exchange RSVP 
messages, with the characteristics of each reservation 
request varying. Other tasks that the processing section may 
perform, such as routing table recalculations, are accounted 

25 for in reserved processing capacity. The relative service 
times for RSVP message processing of the simulations were 
the same as those measured in existing network distributions 
of RSVP software. 
FIGS. 4A and 4B show a number of packet flows for low 

30 and high message processing load cases, respectively, pre- 
sented to the processing section of a router in accordance 
with a FIFO scheduling method of the prior art having a 
disabled refresh timer. FIFO processing for PATH, RESV, 
and UPDATE messages is employed in the simulations of 

35 FIGS. 4A and 4B, and explicit TEAR-DOWN messages are 
given absolute priority. Not" processing TEAR-DG WN-incs- 
sages in a FIFO manner and giving absolute priority releases 
link capacity (decreases link utilization) and reduces the 
chance of other requests being blocked. Message request 

40 size may be the same for all reservation message classes 
(i.e., models a scenario with many flows of same type). 
Connection or call holding time is defined as 300 seconds, 
message processing time is on the order of 100 ms, and 
TEAR-DOWN message processing time is two or three 

45 times higher than the PATH and RESV message processing 
times. 

Link utilization, expressed in number of packet flows 
since message request size is defined as a constant size, is 
shown in FIGS. 4A and 4B as a function of time for the two 

50 cases of processor utilization (low processor load and high 
processor load). For the high processor load case shown in 
FIG. 4B, the load offered is below capacity without TEAR- 
DOWN messages. Once TEAR-DOWN messages are 
included the load often exceeds available processing capac- 

55 ity. For the cases of the exemplary embodiment of FIGS. 4A 
and 4B, the refresh timer is disabled such that it does not 
expire, and the link utilization of the router is shown to be 
unsatisfactory. 

Referring to FIG. 4A, initially, when the number of packet 
60 flows is not very large, the processing load is low enough 
that many flows are successfully established. Note that to 
establish a packet flow, both its PATH and RESV message 
must be successfully processed. As the number of flows 
increases, the UPDATE message traffic increases propor- 
65 tionally and, hence, increases processor load. The number of 
established packet flows continues to increase. In addition, 
some of the established packet flows start generating TEAR- 
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DOWN messages, since their holding times have elapsed. sages are not processed before expiration of the correspond- 

Becausc TEAR -DOWN messages are given priority, the ing refresh timer. The number of spurious terminations is 

number of packet flows in progress decreases. smaller for the FWRR scheduling method and is lowest for 

When the number of packet flows has decreased the AWRR scheduling method, showing the advantage of 

sufficiently, the arrival rate of TEAR-DOWN messages 5 adaptive weighting assignment for scheduling of processor 

decreases sufficiently such that new packet flows are estab- capacity in accordance with the present invention, 

lished at a faster rate than the rate at which packet flows are FIG. 7 shows simulation results for a simulation system 

terminated or torn down. Hence, the number of packet flows similar to that of FIG. 6 but with an added high-processing 

increases again. After a delay, equal to the message holding load due to route processing by the router's processing 

time, TEAR-DOWN messages are generated again and the ^ section. Total processing load is, therefore, much higher 

number of flows (and hence link utilization) goes down. This even though the offered load due to RSVP messages is the 

oscillation in packet flow of FIG. 4A is more pronounced at same. The results of the round-robin scheduling method with 

high message load, such as is shown in the simulation results fixed weights are labeled FWRR, the results of the round- 

of FIG. 4B. robin scheduling method with adaptive weight assignment 

FIGS. 5 A and 5B show a number of packet flows for low in accordance with the present invention are labeled AWRR, 

and high message processing load cases, respectively, pre- 15 and the results of the FIFO scheduling method of the prior 

sented to the processing section of a router in accordance art are labeled FIFO. 

with a FIFO scheduling method of the prior art with a refresh When the processing load increases, the advantages of the 

timer enabled. The oscillation in packet flow is similar to AWRR scheduling method are shown for processing a 

that shown in FIGS. 4A and 4B. Extreme oscillation in number of flows in progress, and for reserved bandwidth. As 

packet flow occurs in both cases shown in FIGS. 5A and 5B. 20 shown in the simulation results of FIG. 7, the AWRR 

Since UPDATE messages are processed in a FIFO manner, scheduling method is effective, for example, in extreme 

UPDATE messages are queued and arc significantly cases where reserved bandwidth is very low and when the 

delayed, or lost, if the queue (buffers) is not large. Delaying reserved bandwidth reaches the maximum link bandwidth 

queued UPDATE messages causes the refresh timer to utilization. For tbe first case where reserved bandwidth is 

expire even if set to a value an order of magnitude greater 25 very low, a number of accepted reservations increases. For 

than end-to-end packet round-trip times. the second case where reserved bandwidth reaches a 

FIG. 6 shows simulation results for a moderately loaded maximum, the number of established, or accepted, connec- 

processing section employing a round-robin scheduling tions is maximized since reserved bandwidth of terminated 

method with both fixed weighting assignment and adaptive connections is freed faster than with the FWRR scheduling 

weighting assignment in accordance with an exemplary 30 method. 

embodiment of the present invention. The simulation results A router employing a round-robin scheduling method 
for the round-robin scheduling method with fixed weights with adaptive weighting assignment in accordance with the 
are labeled FWRR, and have weights chosen to be propor- present invention for processing of message classes allows 
tional to service times. The simulation results for the round- for five desirable processing features when a reservation- 
robin scheduling me thud with adaptive weight assignment 35 based protocol is employed in a packet network. First, 
in accordance with the present invention are labeled AWRR. refresh messages (UPDATE messages) generally require ai 
Also shown in FIG. 6 are results of the FIFO scheduling least a fraction of the available bandwidth, and this fixed 
method of the prior art under moderate processing load. fraction may be made an increasing function of the number 

For the simulation results shown in FIG. 6, and FIG. 7 of packet flows in progress. An upper bound may be 
described subsequently, all reservation message request 40 determined so as to maintain a minimum bandwidth avail- 
sizes are between 1 and 5 kbits/s; the mean call holding time able for other message types. Second, when link utilization 
is 180 seconds; and the inter-arrival times of the message is low, reservation message requests for establishing a 
requests follow an exponential distribution. For the exem- connection (e.g., PATH and RESV messages) may be 
plary simulations, the greatest number of bandwidth reser- assigned a higher weight since delay of TEAR-DOWN 
vation requests are for small bandwidth reservation (e.g. for 45 processing generally does not adversely impact request 
audio conferences of about 64 kb/sec). In addition, some blocking. However, when link utilization is high, processing 
sources generate requests for much larger bandwidth reser- of termination messages (e.g., TEAR-DOWN messages) 
vation (e.g., for video servers or video conferencing may be given a higher weight since processing PATH or 
systems). RESV messages before processing of TEAR-DOWN mes- 

As shown in the simulation results of FIG. 6, the band- 50 sages may result in the bandwidth request fore each PATH 

width reserved for each scheduling method is plotted as a or RESV message being denied. 

function of time. Since the processor load is moderate, tbe Third, assigned weights may be adjusted based on the size 

effect of scheduling is not very pronounced and the reserved of average recent packet flow establishment and termination 

bandwidth is actually somewhat higher for the FIFO sched- requests. For example, if reservations are small and link 

uling method. 55 utilization is low, then the weight assigned to RESV mes- 

However, the higher reserved bandwidth of the FIFO sages may be increased since processing of each message 

scheduling method is because TEAR-DOWN messages are has a much smaller impact on the link utilization. Similarly 

delayed and processed later. Consequently, reserved band- if average recent flow establishment and termination 

width of the FIFO scheduler is wasted because the receiver requests sizes are large, then the weights for reservations 

or sender initiated connection tear-down that terminates 60 may be scaled down. Fourth, processing time for each 

bandwidth usage by the packet flow has not been processed. message type, if known, may be accounted for in assigning 

Both the FWRR and, in particular, the AWRR scheduling weights. Fifth, instantaneous queue-lengths by message type 

methods reclaim the reserved bandwidth faster, and hence may be employed to control queue lengths in extreme 

reserved bandwidth of the FWRR and WRR methods situations when the link is totally over-utilized or totally 

appears lower than that of FIFO scheduling method. 65 under-utilized. 

However, more spurious connection terminations occur While the exemplary embodiments of the present inven- 

for the FIFO scheduling method because UPDATE mes- tion have been described with respect to processing method, 
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the present invention is not so limited. As would be apparent 
to one skilled in the art, various functions may also be 
implemented in the circuits or a combination of circuits and 
in digital domain as processing steps in a software program 
of, for example, a micro-controller or general purpose 
computer. 

It will be further understood that various changes in the 
details, materials, and arrangements of the parts which have 
been described and illustrated in order to explain the nature 
of this invention may be made by those skilled in the art 
without departing from the principle and scope of the 
invention as expressed in the following claims. 

What is claimed is: 

1. A processing section of a router for processing control 
messages in a packet network, the processing section com- 
prising: 

a monitoring module adapted to monitor a link utilization 
value of a link coupled to the rouler; 

a processor to calculate a message request size and a 
corresponding weight for at least one class of control 
messages, each weight calculated based on the link 
utilization value and each message request size; and 

a scheduling module adapted to allocate, for each class of 
control messages, a portion of the processing capacity 
of the processing section based on the corresponding 
weight of the class. 

2. The invention as recited in claim 1, wherein the control 
messages further include an update message class of control 
messages for maintaining at least one established packet 
flow of the link, and the processor further calculates the 
weight for the update message class based on the number of 
established packet flows of the link. 

3. The invention as recited in claim 2, wherein the control 
messages are in accordance with a reservation-based 
protocol, and the control messages include a first class of 
control messages tor establishing ai ieasi one packet flow of 
the link and a second class of control messages for termi- 
nating at least one packet flow of the link. 

4. The invention as recited in claim 2, wherein the 
message request size is based on an average of a requested 
link characteristic of the control messages. 

5. The invention as recited in claim 4, wherein the 
requested link characteristic of the control messages is either 
bandwidth, transmission delay, or probability of lost packet. 

6. The invention as recited in claim 1, wherein: 
the monitoring module further monitors link utilization 

values for two or more links coupled to the router; 

the processor further calculates, for each link, a message 
request size and corresponding weight for each class of 
control messages based on the link utilization of the 50 
link, the processor further adapted to calculate a super- 
class weight for each link; and 

the scheduling module allocates the processing capacity 
of the processing section to each link based on the 



c) allocating, for each class of control messages, a portion 
of the processing capacity of the router based on the 
corresponding weight of the class. 

8. The method as recited in claim 7, wherein the control 
messages further include an update message class of control 
messages for maintaining at least one established packet 
flow of the link, and the calculating step b) further includes 
the step of bl) calculating the weight for the update message 
class based on the number of established packet flows of the 
link. 

9. The method as recited in claim 8, wherein, for the 
calculating step b), the control messages are in accordance 
with a reservation-based protocol, and the control messages 
include a first class of control messages for establishing at 
least one packet flow of the link and a second class of control 

15 messages for terminating at least one packet flow of the link. 

10. The method as recited in claim 8, wherein, for the 
calculating step b), the message request size of a class is 
calculated based on an average of a requested link charac- 
teristic of the control messages. 

11. The method as recited in claim 10, wherein, for the 
calculating step b), the requested link characteristic is either 
bandwidth, transmission delay, or probability of lost packet. 

12. The method as recited in claim 7, wherein: 
the monitoring step a) further includes the step of al) 

monitoring link utilization values for two or more links 
coupled to the router; 
the calculating step b) further includes the steps of b2) 
calculating, for each link, a message request size and 
corresponding weight for each class of control mes- 
sages based on the link utilization of the link, and b3) 
calculating a super-class weight for each link; and 
the allocating step c) further includes the step of cl) 
allocating the processing capacity to each link based on 
the corresponding super-class weight, and c2) allocat- 
ing a pcrlio" of -the processing capacity. allocated to the 
link to each class of the link based on the corresponding 
weight of the class. 

13. A router of an IP packet network having a processing 
section for processing control messages in accordance with 
a reservation-based protocol, the processing section com- 
prising: 

a monitoring module adapted to monitor a link utilization 

value of a link coupled to the router; 
a processor adapted to calculate a message request size 
and a corresponding weight for at least one class of 
control messages, each weight calculated based on the 
link utilization value and each message request size; 
and 

a scheduling module adapted to allocate, for each class of 
control messages, a portion of the processing capacity 
of the processing section based on the corresponding 
weight of the class. 

14. The invention as recited in claim 13, wherein the 
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corresponding super-class weight, and allocates a por- 55 control messages further include an update message class of 



tion of the processing capacity allocated to the link to 
each class of the link based on the corresponding 
weight of the class. 
7. A method for allocating processing capacity to control 

messages received by a router in a packet network, the w 

method comprising the steps of: 

a) monitoring a link utilization value of a link coupled to 
the router, 

b) calculating a message request size and a corresponding 
weight for at least one class of control messages, each 65 
weight calculated based on the link utilization value 
and each message request size; and 



control messages for maintaining at least one established 
packet flow of the link, and the processor further calculates 
the weight for the update message class based on the number 
of established packet flows of the link. 
15. The invention as recited in claim 14, wherein: 
the monitoring module further monitors link utilization 

values for two or more links coupled to the router; 
the processor further calculates, for each link, a message 
request size and corresponding weight for each class of 
control messages based on the link utilization of the 
link, the processor further adapted to calculate a super- 
class weight for each link; and 
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the scheduling module allocates the processing capacity each class of the link based on the corresponding 

of the processing section to each link based on the weight of the class, 
corresponding super-class weight, and allocates a por- 
tion of the processing capacity allocated to the link to ***** 
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MECHANISM FOR CONVEYING DATA 
PRIORITIZATION INFORMATION AMONG 
HETEROGENEOUS NODES OF A 
COMPUTER NETWORK 

5 

CROSS-REFERENCE TO RELATED 
APPLICATIONS 

This invention is related to the following copending U.S. 
patent application: 1Q 

U. S. patent application Ser. No. 08/839,435, titled 
TECHNIQUE FOR MAINTAINING PRIORITIZATION 
OF DATA TRANSFERRED AMONG HETEROGE- 
NEOUS NODES OF A COMPUTER NETWORK- U.S. 
patent application Ser. No. 08/833,837, tided TECHNIQUE 15 
FOR CAPTURING INFORMATION NEEDED TO 
IMPLEMENT TRANSMISSION PRIORITY ROUTING 
AMONG HETEROGENEOUS NODES OF A COM- 
PUTER NETWORK, which applications were filed on even 
date herewith and assigned to the assignee of the present ^ 
invention. 

U.S. patent application Ser. No. 08/926,539, titled TECH- 
NIQUE FOR REDUCING THE FLOW OF TOPOLOGY 
INFORMATION AMONG NODES OF A COMPUTER 
NETWORK, which application was filed on Sep. 10, 1997 25 
and assigned to the assignee of the present invention. 

FIELD OF THE INVENTION 

The invention relates to computer networks and, more 
particularly, to the distribution of packet prioritization infor- 30 
mation among stations of a computer network. 

BACKGROUND OF THE INVENTION 

Data ™ mm n mention in a computer network involves the 3S 
exchange of data between two or more entities intercon- 
nected by communication links and sub-networks. These 
entities are typically software programs executing on hard- 
ware computer platforms, such as end stations and interme- 
diate stations. Examples of an intermediate station may be a ^ 
router or switch which interconnects the communication 
links and subnetworks to enable transmission of data 
between the end stations. A local area network (LAN) is an 
example of a subnetwork that provides relatively short 
distance communication among the interconnected stations; 45 
in contrast, a wide area network (WAN) enables long 
distance communication over links provided by public or 
private telecommunications facilities. 

Communication software executing on the end stations 
correlate and manage data communication with other end 50 
stations. The stations typically communicate by exchanging 
discrete packets or frames of data according to predefined 
protocols. In this context, a protocol consists of a set of rules 
defining how the stations interact with each other. In 
addition, network routing software executing on the routers 55 
allow expansion of communication to other end stations. 
Collectively, these hardware and software components com- 
prise a communications network and their interconnections 
are defined by an underlying architecture. 

Modem communications network architectures are typi- 60 
cally organized as a series of hardware and software levels 
or "layers" within each station. These layers interact to 
format data for transfer between, e.g., a source station and a 
destination station communicating over the network. 
Specifically, predetermined services are performed on the 65 
data as it passes through each layer and the layers commu- 
nicate with each other by means of the predefined protocols. 



,390 

2 

The lower layers of these architectures are generally stan- 
dardized and are typically implemented in hardware and 
firmware, whereas the higher layers are generally imple- 
mented in the form of software running on the stations 
attached to the network. Examples of such communications 
architectures include the Systems Network Architecture 
(SNA) developed by Inteerational Business Machines Cor- 
poration and the Internet communications architecture. 

The Internet architecture is represented by four layers 
which are termed, in ascending interfacing order, the net- 
work interface, internetwork, transport and application lay- 
ers. These layers are arranged to form a protocol stack in 
each communicating station of the network. FIG. 1 illus- 
trates a schematic block diagram of prior art Internet pro- 
tocol stacks 125 and 175 used to transmit data between a 
source station 110 and a destination station 150, 
respectively, of a network 100. As can be seen, the stacks 
125 and 175 are physically connected through a communi- 
cations channel 180 at the network interface layers 120 and 
160. For ease of description, the protocol stack 125 will be 
described. 

In general, the lower layers of the communications stack 
provide internetworking services and the upper layers, 
which arc the users of these services, collectively provide 
common network application services. The application layer 
112 provides services suitable for the different types of 
applications using the network, while the lower network 
interface layer 120 of the Internet architecture accepts 
industry standards defining a flexible network architecture 
oriented to the implementation of LANs. 

Specifically, the network interface layer 120 comprises 
physical and data link sublayers. The physical layer 126 is 
concerned with the actual transmission of signals across the 
communication channel and defines the types of cabling, 
plugs and conneciuis used in connection with the channel. 
The data link layer, on the other hand, is responsible for 
transmission of data from one station to another and may be 
further divided into two sublayers: Logical Link Control 
(LLC 122) and Media Access Control (MAC 124). 

The MAC sublayer 124 is primarily concerned with 
controlling access to the transmission medium in an orderly 
manner and, to that end, defines procedures by which the 
stations must abide in order to share the medium. In order for 
multiple stations to share the same medium and still 
uniquely identify each other, the MAC sublayer defines a 
hardware or data link address called a MAC address. This 
MAC address is unique for each station interfacing to a 
LAN. The LLC sublayer 122 manages communications 
between devices over a single link of the network and 
provides for environments that need connectionless or 
connection-oriented services at the data link layer. 

Connection-oriented services at the data link layer gen- 
erally involve three distinct phases: connection 
establishment, data transfer and connection termination. 
During connection establishment, a single path is estab- 
lished between the source and destination stations. This 
connection, e.g., an IEEE 802.2 LLC Type 2 or "Data Link 
Control" (DLC) connection as referred hereinafter, is based 
on the use of service access points (SAPs); a SAP is 
generally the address of a port or access point to a higher- 
level layer of a station. Once the connection has been 
established, data is transferred sequentially over the path 
and, when the DLC connection is no longer needed, the path 
is terminated. The details of such connection establishment 
and termination are well-known and, thus, will not be 
described herein. 
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The transport layer 114 and the internetwork layer 116 are 
substantially involved in providing predefined sets of ser- 
vices to aid in connecting the source station to the destina- 
tion station when establishing application-to-application 
communication sessions. The primary network layer proto- 
col of the Internet architecture is the Internet protocol (IP) 
contained within the internetwork layer 116. IP is primarily 
a connectionless network protocol that provides internet- 
work routing, fragmentation and reassembly of datagrams 
and that relies on transport protocols for end-to-end reliabil- 
ity. An example of such a transport protocol is the Trans- 
missioD Control Protocol (TCP) contained within the trans- 
port layer 114. Notably, TCP provides connection-oriented 
services to the upper layer protocols of the Interact archi- 
tecture. The term TCP/IP is commonly used to refer to the 
Internet architecture. 

Data transmission over the network 100 therefore consists 
of generating data in, e.g., sending process 104 executing on 
the source station 110, passing that data to the application 
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(NN), are present in a session between the two end nodes. As 
can be seen, the APPN network nodes are further intercon- 
nected by a WAN 210 that extends the APPN architecture 
throughout the network. The APPN network nodes forward 
packets of an LU-LU session over the calculated route 
between the two APPN end nodes. An APPN network node 
is a full-functioning APPN router node having all APPN 
base service capabilities, including session services func- 
tions. An APPN end node, on the other hand, is capable of 
performing only a subset of the functions provided by an 
APPN network node. APPN network and end nodes are 
well-known and are, for example, described in detail in 
Systems Network Architecture Advanced Peer to Peer Net- 
working Architecture Reference IBM Doc SC30-3422 and 
APPN Networks by Jesper Nilausen, printed by John Wiley 
and Sons, 1994, at pgs 11-33. 

FIG. 3 is a schematic block diagram of the software 
architecture of a prior art APPN node 300. As noted, 
application 302 executing on an APPN end node, such as EN 
202 of network 200, communicates with another end node, 



layer 112 and down through the layers of the protocol stack 20 , p.,,.- 4 . * , IT tth - ItMn i . ' 



125, where the data are sequentially formatted as a frame for 
delivery onto the channel 180 as bits. Those frame bits are 
then transmitted over an established connection of channel 
180 to the protocol stack 175 of the destination station 150 
where they are passed up that stack to a receiving process 25 
174. Data flow is schematically illustrated by solid arrows. 

Although actual data transmission occurs vertically 
through the stacks, each layer is programmed as though such 
transmission were horizontal. That is, each layer in the 
source station 110 is programmed to transmit data to its 
corresponding layer in the destination station 150, as sche- 
matically shown by dotted arrows. To achieve this effect, 
each layer of the protocol stack 125 in the source station 110 
typically adds information (in the form of a header field) to 
the data frame generated byihe Seeding process as the frame 
descends the stack. At the destination station 150, the 
various encapsulated headers are stripped off one-by-one as 
the frame propagates up the layers of the stack 175 until it 
arrives at the receiving process. 

SNA is a mainframe-oriented network architecture that 
also uses a layered approach. The services included within 
this architecture are generally similar to those defined in the 
Internet communications architecture. In a SNA network, 
though, applications executing on end stations typically 
access the network through logical units (LU) of the sta- 
tions; accordingly, in a typical SNA network, a communi- 
cation session connects two LUs in a LU-LU session. 
Activation and deactivation of such a session is addressed by 
Advanced Peer to Peer Networking (APPN) functions. 

The APPN functions generally include session establish- 
ment and session routing within an APPN network. FIG. 2 
is a schematic block diagram of a prior art APPN network 
200 comprising two end stations 202, 212, which are typi- 
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each end node functions as both a logical port for the 
application to the network and as an end point of the 
communication session. The session generally passes 
through a path control module 312 and a data link control 
(DLC) module 316 of the node, the latter of which connects 
to various network transmission media. 

When functioning as an APPN router node, such as NN 
206, an intermediate session routing (ISR) module 305 
maintains a portion of the session in each "direction" with 
respect to an adjacent network node, such as NN 216 of 
network 200. In response to receiving the BIND message 
during session establishment, path control 312 and ISR 305 
are invoked to allocate resources for the session. In 
particular, each NN 206, 216 allocates a local form session 
identifier (LFSiD) for each uiiecuon-Gi the session; the 
LFS1D is thereafter appended to the packets in a SNA 
transmission header (TH) to identify the session context. 
Collectively, each of these individually-established "local" 
sessions form the logical communication session between 
the LUs 304 of the end nodes 202, 212. 

When initialing a session, the application 302 specifies a 
mode name that is carried within the BIND message and 
distributed to all APPN network nodes; the LU 304 in each 
node uses the mode name to indicate the set of required 
characteristics for the session being established. 
Specifically, the mode name is used by control point (CP) 
module 308 of each APPN node 300 to find a corresponding 
class of service (COS) as defined in a COS table 310. The 
CP coordinates performance of all APPN functions within 
the node, including management of the COS table 310. The 
COS definition in table 310 includes a priority level speci- 
fied by transmission priority (TP) information 320 for the 
packets transferred over the session; as a result, each APPN 



cally configured as end nodes (EN), coupled to token ring 55 network node is apprised of the priority associated with the 



(TR) subnetworks 204, 214, respectively. During session 
establishment, an EN (such as EN 202) requests an optimum 
route for a session between two LUs; this route is calculated 
and conveyed to EN 202 by an intermediate station func- 
tioning as a network node server (e.g., station 206) via a 
LOCATE message exchange through the network 200. 
Thereafter, a "set-up" or BIND message is forwarded over 
the route to initiate the session. The BIND includes infor- 
mation pertaining to the partner LU requested for the 
session. 

Intermediate session routing occurs when the intermedi- 
ate stations 206, 216, configured as APPN network nodes 



65 



packets of a LU-LU session. The SNA architecture specifies 
four (4) TP levels: network priority, high priority, medium 
priority and low priority. Path control 312 maintains a 
plurality of queues 314, one for each TP level, for transmit- 
ting packets onto the transmission media via DLC 316. 

Data link switching (DLSw) is a forwarding mechanism 
for the SNA architecture over an IP backbone network, such 
as the Internet. A heterogeneous DLSw network is formed 
when two DLSw switches interconnect the end nodes of the 
APPN network by way of the IP network; the DLSw 
switches preferably communicate using a switch-to-switch 
protocol (SSP) that provides packet "bridging" operations at 
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the LLC (i.e., DLC) protocol layer. FIG. 4 is a schematic switch has no knowledge of the TP level associated with the 

block diagram of a prior art DLSw network 400 comprising outbound packet. If that packet requests a response, the 

DLSw switches 406, 416 interconnecting ENs 402, 412 via DLSw switch cannot select, on the basis of priority, the 

IP network 410. The DLSw forwarding mechanism is also proper TCP session over which to transmit a corresponding 

well-known and described in detail in Request for Comment 5 inbound packet; accordingly, the switch arbitrarily chooses 

(RFC) 1795 by Wells & Bartky, 1995 at pgs 1-91. a ^on. If mc choscn TCP scssion has a Jower designated 

According to the DLSw technique, a lower-layer DLC priority than the session carrying the outbound packet, 

connection is established between each EN and DLSw network throughput may be negatively impacted, 

switch; however, these connections terminate at the switches _ , . . . . . . , , 

406, 416. In order to provide a complete end-to-end con- 10 ? ne soUiUon to this problem is to dep toy another hybnd 

nection between the end nodes, the DLC connections are 10 Dode f ^ of the re f ™ng DLSw switch This approach 

"disposed" over a reliable, higher-layer transport 15 ™ d f* aWe pnmanly J»?»*f a &™} of heterogeneous 

mechanism, such as TCP sessions. DLSw switches can network design is to minimize the number of hybnd nodes 

establish multiple, parallel TCP sessions using well-known "\ th * ne "; work - U A rea f n f or minimizing the number of 

port numbers. All packets associated with a particular DLC „ h ^ nd nodes 15 lhat such nodes r ^ uire ^diUonal processing 

connection typically follow a single, designated TCP ses- 15 * nd memory resources, thereby resulting m expensive 

sion. Accordingly, SNA data frames originating at a sending deployments The present invention is directed to solving the 

EN 402 are transmitted over a particular DLC connection P roblen ; ° f doming packet prioritization information, 

along TR 404 to DLSw switch 406, where they are encap- aa ?B« d b ? \ W*"* Qode ot * heterogeneous network, to 

sulated within a designated TCP session as packets and 2Q swltchin S nodes of lhe ttetW0lk - 

transported over IP network 410. The packets are received SUMMARY OF THE INVENTION 
by DLSw switch 416, decapsulated to their original frames 

and transmitted over a corresponding DLC connection of TR The invention comprises a mechanism for conveying 

414 to EN 412 in the order received by switch 406 from EN information pertaining to transmission priority (TP) levels of 

402. inbound packets transmitted over a heterogeneous network 

Typically, all packets transmitted by DLSw switch 406 from a switching node to a hybrid node of the network. The 

over a DLC connectionyTCP session flow at the same mechanism comprises a packet-recognizing filter having a 

priority level from a single output queue 405 of the switch novel format that is generated by the hybrid node and 

and arrive at an output queue 415 of DLSw switch 416 in the dynamically transmitted to the switching node over a pre- 

same order in which they are transmitted. When the switches 30 defined communication channel of the network. As 

are configured as bridges to forward packets over a TCP described further herein, the filter enables the switching 

session through the IP network, prioritization is straightfor- node t0 classify the inbound packets and assign them 

ward. However, it may be desired to integrate the functions appropriate TP levels. 

of an APPN network node within switch 406 by overlaying In the illustrative embodiment, the heterogeneous net- 
an APPN hyer onto a Dl-Sw layer of the switch; the ^ work is preferably a data link switching (DLSw) network 
resulting hybrid node may prioritize the packets at the APPN with end nodes interconnected by way of an Internet pro- 
layer in an order governed by the TP information levels. tocol (IP) backbone network and the hybrid node is an 
A problem that arises when deploying a hybrid node in advanced peer-to-peer networking (APPN) node with DLSw 
such a heterogeneous network is that the TP priority infor- capabilities. Applications executing on the end nodes com- 
mation is lost when passing the packets between the APPN 40 municate via logical unit to logical unit (LU-LU) sessions, 
and DLSw layers, primarily because the TP information is whereas the switching node communicates with the APPN 
not encapsulated within the packets. That is, the APPN layer node using a switch-to-switch protocol (SSP) over data link 
has knowledge of the TP levels associated with the packets control (DLC) connections associated with the LU-LU ses- 
of a LU-LU scssion as a result of the BIND message sions of the DLSw network; these DLC connections are 
exchange during session establishment; yet that information 45 further overlayed onto existing transmission control proto- 
is not encapsulated within the associated packets and, thus, col (TCP) sessions of the IP network. Preferably, each TCP 
is not conveyed beyond the APPN layer. An example of a session is further associated with a TP level, 
tagging mechanism suitable for use with the present inven- According to aspects of the invention, the predefined 
tion that conveys TP levels from the APPN layer to the communication channel may be implemented as either an 
DLSw layer is disclosed in copending and commonly- 50 in-band channel over one of the existing TCP sessions using 
assigned U.S. patent application, titled Technique for Main- novel extensions to SSP, or an out-band channel over a 
taining Prioritization of Data Transferred Among Hetero- newly-created TCP session. The format of the filter is 
geneous Nodes of a Computer Network, filed herewith and preferably customized for each channel implementation; 
incorporated by reference as though fully set forth herein. nevertheless, each filter includes a unique opcode identify- 
As described in the commonly-assigned application, the 55 ing the filter, a format identifier (FID) denoting the format of 
APPN protocol layer of the hybrid node assigns a TP level a specific inbound packet, a local form session identifier 
to each packet and passes that priority information to the (LFSID) that classifies the LU-LU session context of the 
DLSw layer of the node via an application programming specific packet and a priority identifier specifying the TP 
interface extension. The TP level is converted to information level of the packet. 

that is "tagged" to each packet and the DLSw layer allocates 60 Operationally, an APPN protocol layer of the APPN node 

each tagged packet to a TCP scssion based on the assigned passes the op-code, LFSID, FID and priority identifier to a 

TP level. The tagged information is then encapsulated within DLSw protocol layer of the node, through an application 

an IP header to enable intermediate routers to maintain the programming interface (API), during establishment of the 

order and priority of the packet as it is transmitted outbound LU-LU session. In response to the API, the DLSw layer 

over the IP network to a receiving DLSw switch. 65 encapsulates these identifiers within fields of the filter and 

However, the tagged information within the IP header is transfers the filter over the communication channel to the 

not discernible to the receiving DLSw switch and, thus, the switching node. When transferring the filter over the in-band 
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communication channel, the opcode is encapsulated within 
a SSP header, whereas for the out-band channel 
embodiment, additional addressing information is encapsu- 
lated with the opcode in fields of a defined header. 

Upon receiving the filter, a DLSw layer of the switching 
node stores the LFSID, FID and priority identifier and 
proceeds to examine each inbound packet prior to forward- 
ing it to the APPN node. Specifically, the switching node 
initially determines the format of each packet and if it 
matches the stored FID, the node compares the LFSID of the 
inbound packet with the stored LFSID to identify the 
LU-LU session context of the packet. If the values of these 
latter identifiers match, the switching node assigns to the 
inbound packet the TP level specified by the stored priority 
identifier and forwards the packet to the APPN node over an 
appropriate one of the existing TCP sessions. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The above and further advantages of the invention may be 
better understood by referring to the following description in 
conjunction with the accompanying drawings in which like 
reference numbers indicate identical or functionally similar 
elements: 

FIG. 1 is a schematic block diagram of prior art commu- 
nications architecture protocol stacks, such as the Internet 
protocol stack, used to transmit data between stations of a 
computer network; 

FIG. 2 is a schematic block diagram of a prior art 
Advanced Peer to Peer Networking (APPN) network includ- 
ing APPN nodes; 

FIG. 3 is a schematic block diagram of the software 
architecture a prior art APPN node; 

FIG. 4 is a schematic block diagram of a prior art data link 
switching (DLSw) network; 

FIG. 5 is a block diagram of a heterogeneous computer 
network, including a DLSw node and an APPN/DLSw 
hybrid node for interconnecting various subnetworks and 
communication links on which the present invention may 
advantageously operate; 

FIG. 6 is a schematic block diagram of protocol stacks 
contained within the DLSw and APPN/DLSw nodes of FIG. 

5; 

FIG. 7 is a schematic block diagram illustrating the 
assignment of priority levels among established communi- 
cation sessions and the distribution of packets among the 
sessions; 

FIG. 8 is a schematic block diagram of a novel packet- 
recognizing filter generated by the hybrid node of FIG. 5 and 50 
dynamically transmitted to the DLSw node over a pre- 
defined communication channel in accordance with the 
invention; 

FIGS. 9 A and 9B are schematic block diagrams depicting 
formats of the novel packet-recognizing filter of FIG. 8; 

FIG. 10 is a flowchart illustrating use of the novel filter in 
accordance with the present invention; and 

FIG. 11 is a schematic block diagram depicting the format 
of a conventional transmission header upon which the 
inventive packet-recognizing filter may advantageously 
operate. 



DETAILED DESCRIPTION OF ILLUSTRATIVE 
EMBODIMENT 

FIG. 5 is a block diagram of a computer network 500 
comprising a collection of interconnected communication 



links and subnetworks attached to a plurality of stations. The 
stations are typically computers comprising end stations 
502, 512 and intermediate stations 600, 650. Preferably, the 
end stations are Advanced Peer to Peer Networking (APPN) 
5 end nodes, although the stations may comprise other types 
of nodes such as Low Entry Networking nodes or Physical 
Units 2.0 via Dependent Logical Unit Requestor functions. 
In addition, the intermediate station 650 is a data link 
switching (DLSw) node and intermediate station 600 is an 
10 APPN/DLSw hybrid node. 

Each node typically comprises a plurality of intercon- 
nected elements, such as a processor, a memory and a 
network adapter. The memory may comprise storage loca- 
tions addressable by the processor and adapter for storing 
15 software programs and data structures associated with the 
inventive filtering mechanism and techniques. The processor 
may comprise processing elements or logic for executing the 
software programs and manipulating the data structures. An 
operating system, portions of which are typically resident in 
memory and executed by the processor, functionally orga- 
nizes the node by, inter alia, invoking network operations in 
support of software processes executing on the node. It will 
be apparent to those skilled in the art that other processor 
and memory means, including various computer readable 
25 media, may be used for storing and executing program 
instructions pertaining to the techniques described herein. 

The subnetworks included within network 500 are pref- 
erably local area networks (LANs) and the communication 
links may include wide area network (WAN) links; in the 
30 illustrative embodiment of the invention, the LANs are 
preferably token rings (TR) 504, 514 and an IP network 510, 
which may comprise cither a LAN and/or a WAN configu- 
ration such as X.25, interconnects the nodes 600, 650, 
Communication among the nodes coupled to the network 
500 is typically effected by exchanging Hiscrete.data packets 
or frames via connection-oriented service sessions between 
the communicating nodes. 

Heterogeneous (DLSw) network 500 is formed when 
APPN/DLSw hybrid node 600 is connected to DLSw node 
650 via IP network 510. FIG. 6 is a schematic block diagram 
of protocol stacks 610, 660 within the nodes 600 and 650, 
respectively. Applications executing on SNA devices (end 
stations) 602, 608 typically access the network through 
logical units (LUs) of the stations and communicate via 
LU-LU sessions. Hybrid node 600 functions to facilitate 
establishment and routing of these connection-oriented com- 
munication sessions within the network. To this end, proto- 
col stack 610 preferably comprises an APPN layer 612 that 
contains the software modules described in FIG. 3. 

The stack 610 also includes a Transmission Control 
Protocol/Internet protocol (TCP/IP) layer 616 containing 
those layers of the Internet communications architecture 
protocol stack (FIG. 1) needed to establish, e.g., conven- 
tional connection-oriented, TCP communication sessions. 
Physical sublayers 622 and 626 specify the electrical, 
mechanical, procedural and functional specifications for 
activating, maintaining and de-activating the physical links 
604 and 605 of the network. Protocol stack 660 of DLSw 
60 node 650 likewise includes a TCP/IP layer 666 and physical 
sublayers 672 and 676, which are functionally equivalent to 
those layers of protocol stack 610. 

Each node 600, 650 further contains a DLSw layer 614, 
664 and data link control (DLC) layers 618, 620 and 668, 
65 670, respectively, the latter layers providing a connection- 
oriented service via conventional DLC connections. The 
DLSw layers provide a mechanism for forwarding data 



35 



40 



45 



55 



01/24/2004, EAST Version: 1.4.1 



5,940,390 

9 10 

frame traffic between devices 602, 608 over IP network 605. 712-718. Each TCP session (and queue) is preferably asso- 

Preferably, the DLSw layers 614, 664 cooperate in a peer- ciated with a TP level; for example, session 732 (and queue 

relationship and communicate via a switch-ta-switch proto- 712) are assigned a high-priority level, session 734 (and 

col (SSP) to, inter alia, define TCP sessions over the IP queue 714) are assigned a medium-priority level, session 

network. 5 736 (and queue 716) are assigned a normal-priority level and 

In the illustrative embodiment, there are a plurality of session 738 (and queue 718) are assigned a low-priority 

connection/session "views" established within the network. level. 

For example, from an APPN view, there is a DLC oonnec- The tagged information is encapsulated within an IP 

tion 646 between device 602 and APPN layer 612 of node header of the packet prior to outbound transmission over the 

600, and a DLC connection 648 between APPN layer 612 1° IP network 605 to the DLSw node 650. As noted, the tagged 

and device 608. From a DLSw view, there is a DLC information is not discernible to the DLSw node 650 and, if 

connection 642 between APPN layer 612 and DLSw layer required to respond to the packet, that node cannot select, on 

614 of node 600, and a DLC connection 644 between DLSw the basis of priority, the proper TCP session over which to 

layer 664 and device 608; in order to provide reliable, transmit a corresponding inbound packet because it has no 

end-to-end connections between the devices, these DLC 15 knowledge of the TP level associated with the outbound 

connections are "overlayed" onto TCP sessions (denoted packet. 

645) between the two DLSw layers 614, 664. Lastly, from In accordance with the present invention, a mechanism is 

a LU view, there are multiple LU-LU sessions 680 (at provided for conveying the TP level of an inbound packet 

various priority levels) between the LUs of devices 602 and transmitted over a heterogeneous network from DLSw node 

608. 20 10 hybrid Qode goo. Referring to FIG. 8, the mechanism 

It should be noted that the TCP sessions are initiated comprises a packet-recognizing filter 800 having a novel 

between DLSw peers 614, 664 in accordance with a con- format that is generated by the hybrid node 600 and dynami- 

ventional TCP transport protocol. Thereafter, SSP control cally transmitted to DLSw node 650 over a predefined 

messages are exchanged between the DLSw layers 614, 664 communication channel 850 of IP network 605. As 

of the nodes to establish an end-to-end DLSw circuit over 25 described further herein, the filter 800 enables the switching 

the session. Information contained within these control node 650 to classify each inbound packet 810 and assign it 

messages are used to generate a DLSw circuit identifier (ID) an appropriate TP level. 

that associates the DLSw circuit with the session. Preferably, According to an aspect of the invention, the predefined 

the DLC connections 642, 644 overlayed on the TCP session communication channel 800 may be implemented as either 

645 "map" to the DLSw circuit. The generation of DLSw 30 an in-band channel over one of the existing TCP sessions 

circuits and identifiers is described in Request for Comment 732-738 using novel extensions to SSP, or an out-band 

(RFC) 1795 by Wells & Bartky, 1995, while the establish- channel over a newly-created TCP session. For this latter 

ment of multiple TCP sessions between DLSw peer layers is channel implementation, the newly -created TCP session is 

described in both RFC 1795 and Internetworking with established in accordance with the conventional TCP trans- 

TCP/IP by Comet and Stevens, printed by Prentice Haii, " pori piuiucol described ahevs, 

1991 ; all of these publications are hereby incorporated by ] n another aspect of the invention, the format of filter 800 

reference as though fully set forth herein. is preferably customized for each channel implementation, 

Typically, packets transmitted by a DLSw switch over a as depicted in FIGS. 9A and 9B. For each case, the filter 

TCP session flow at the same priority level from a single ^ includes a well-defined, unique opcode identifying the filter 

output queue of the switch and arrive at a peer DLSw switch used in the illustrative network configuration, a format 

in the same order in which they are transmitted. Hybrid node identifier (FID) denoting the format of a specific inbound 

600 may, however, prioritize the packets of a LU-LU session packet, a local form session identifier (LFSID) that classifies 

at the APPN layer 612 in an order specified by transmission the LU-LU session context of the specific packet and a 

priority (TP) information contained within the node 600. 45 priority identifier specifying the TP level of the packet. 

FIG. 7 is a schematic block diagram illustrating the assign- FIG. 9A illustrates the format 900 of the filter 800 

ment of TP levels among established communication ses- configured for transfer over the in-band communication 

sions and the distribution of packets among those sessions. channel. Here, the opcode 904 is encapsulated by DLSw 

Apath control module 312 (FIG. 3) of the APPN layer 612 layer 614 within a SSP header 910 along with the DLSw 

within node 600 maintains four queues 702-708, one for 50 circuit ID 902; since the circuit ID 902 associates an 

each TP level, for transmitting data packets (received from end-to-end DLSw circuit with the LU-LU session of the 

DLC connection 646) over established TCP sessions 645 of specific inbound packet, additional addressing information 

the network. As described above, TCP sessions are estab- is not needed. Such additional addressing information com- 

lished through the IP network 605 in accordance with prises media access control (MAC) addresses of the source 

conventional TCP/IP transport mechanisms within the 55 node (SM AC) and destination node (DMAC) which, for this 

APPN/DLSw node 600 and the DLSw node 650; example, are nodes 600 and 650, respectively, and service 

illustratively, these nodes cooperate in a peer-relationship to access points (SAP) addresses of the source (SSAP) and 

establish multiple, parallel TCP sessions 732-738 over the destination (DSAP) nodes. 

network. The format 900 further stores the FID, LFSID and priority 
The tagging mechanism of the commonly-assigned appli- 60 identifiers within fields 912-916, respectively. The contents 
cation incorporated by reference herein allows the hybrid of these identifiers (along with the additional addressing 
node 600 to convey a TP level from its APPN layer 612 to information) are provided by the APPN layer 612 (FIG. 8) to 
its DLSw layer 614, convert that TP level to information that the DLSw layer 614 via an application programming inter- 
is "tagged" to each outbound packet, and allocate the tagged face (API) layer 820 using a data control flow mechanism, 
packet to a TCP session based on the assigned TP level. 65 such as an API call 825. 

Specifically, the DLSw layer 614 loads the packets into the FIG. 9B illustrates the format 950 of the filter 800 

queue 725 and then distributes them among four queues configured for transfer over the out-of-band channel 
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embodiment. Since a new TCP session is created for this 
channel embodiment, the additional addressing information 
passed from the APPN layer 612 to the DLSw layer 614 via 
the API call 825 is used in format 950. Specifically, a denned 
header 970 is generated by DLSw layer 614 for encapsu- 
lating the opcode in field 954, SMAC in field 956, DMAC 
in field 958, SSAP in field 960 and DSAP in field 962; a 
value specifying the length of the filter is stored in field 952 
of the header 970. The format 950 further accomodates the 
FID, LFSID and priority identifiers within fields 972-976, 
respectively. 

Operation of the present inventive filter mechanism will 
now be described in connection with the flowchart of FIG. 
10. The operation starts at Step 1000 and proceeds to Step 
1010 where APPN protocol layer 612 of hybrid node 600 
passes filter-identifying information, such as the opcode, 
LFSID, FID and priority identifier, to DLSw protocol layer 
614 through API layer 820 during establishment of the 
LU-LU session. In response to the API, the DLSw layer 
encapsulates these identifiers within fields of the filter as 
described above and transfers the filter over the communi- 
cation channel 850 to the DLSw switching node 650 in Step 
1015. 

Upon receiving the filter, DLSw layer 664 of the switch- 
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(TOS) field when building an IP header for the packets 
during TCP session establishment. Thereafter, the prioritized 
packet is transferred through the core IP backbone network 
605 to the hybrid node 600 in Step 1055 and the operation 
ends in Step 1060. 

Advantageously, the present invention enables the APPN 
layer of a hybrid node to instruct the DLSw layer of a 
receiving switching node as to the TP level of an inbound 
packet destined for the hybrid node. In response to the 
instruction, the switching node assigns the inbound packet to 
a particular TCP session, where priority is preserved at 
intermediate queuing points on the basis of the value of the 
precedence bits in the IP header. 

While there has been shown and described an illustrative 
embodiment for conveying information pertaining to prior- 
ity levels of inbound packets transmitted over a heteroge- 
neous network from a switching node to a hybrid node of the 
network, it is to be understood that various other adaptations 
and modifications may be made within the spirit and scope 
of the invention. For example in an alternate embodiment of 
the invention, a different transport mechanism may be 
employed in the heterogeneous network to transport differ- 
ent packet formats among the end nodes, such as NetBIOS 



ing node 650 interprets the opcode as describing the type of ^ *^ C «J of * C . DctW ° r ^ Hc ? C ' " aiqUC > yct WeU - 



message as a filter, parses the fields of the filter and stores 
the LFSID, FID and priority identifier in a temporary storage 
location 805 of, e.g., the memory of node 650 (Step 1020). 
The layer 664 then proceeds to examine each inbound 
packet 810 prior to forwarding it to the APPN node 600. The 
inbound packets are typically Systems Network Architecture 
(SNA) type frames generated by SNA devices coupled to the 
DLSw network; these frames are, in turn, typically encap- 
sulated with conventional SNA transmission header (TH) 
heaueiE>~ rlGrll is a schematic-block .diaaram.of the format 
1100 of the TH header that DLSw layer 664 of node 650 is 
configured to operate on using the packet-recogoizing filter 
800. 

Specifically, the switching node determines the format of 



30 



defined opcode is needed to identify the filter used to convey 
the priority levels of these different inbound packets over the 
alternately-configured network. 

The foregoing description has been directed to specific 
embodiments of this invention. It will be apparent, however, 
that other variations and modifications may be made to the 
described embodiments, with the attainment of some or all 
of their advantages. Therefore, it is the object of the 
appended claims to cover all such variations and modifica- 
tions as come within the true spirit and scope of the 
invention. 
What is claimed is: 

1. Apparatus for conveying information pertaining to 
transmission priority (TP) levels of inbound packets trans- 



each inbound packet by initially examining the contents of 40 mitted over a heter °g eaeous network from a switching node 

to a hybrid node of the network, the apparatus comprising: 

a predefined communication channel interconnecting the 

hybrid and switching nodes; and 
a packet-recognizing filter generated by the hybrid node 
45 and dynamically transmitted to the switching node over 
the predefined communication channel, the filter 
enabling the switching node to classify the inbound 
packets and assign them appropriate TP levels. 

2. The apparatus of claim 1 wherein the predefined 
50 communication channel is one of an in -band channel over an 

existing transport session connection between the nodes and 
an out-of-band channel over a newly-created transport ses- 
sion between the nodes. 

3. The apparatus of claim 2 wherein the filter comprises 
55 identifiers identifying attributes of the inbound packets, such 

that inbound packets matching these identifiers are associ- 
ated with appropriate TP levels. 

4. The apparatus of claim 3 wherein the identifiers com- 
prise a local form session identifier classifying the session 

60 context of a specific inbound packet and a priority identifier 
specifying the TP level of the packet. 

5. The apparatus of claim 4 wherein the hybrid node 
comprises an Advanced Peer to Peer Networking (APPN) 
protocol layer and a Data Link Switching (DLSw) protocol 

65 layer, and wherein the APPN protocol layer passes the 
identifiers to the DLSw protocol layer of the hybrid node 
through an application programming interface. 



a FID type field 1U0 of the TH header and comparing them 
with the stored FID identifier in Step 1025. It should be 
noted that the switching node is configured to recognize the 
format of TH header and, thus, can access the contents of 
any particular fields contained therein. The SNA architecture 
defines several different types of packet formats; in the 
illustrative embodiment, a FID2 format is preferably used 
for communication among the SNA devices 602,608. If the 
contents of the FID type field do not match the contents of 
the stored FID (Step 1030), a default priority is assigned to 
the inbound packet by the switching node (Step 1035). 

If there is a match in Step 1030, the node then accesses the 
contents of the LFSID field 1120 of the header and compares 
those contents with the contents of the stored LFSID to 
identify the LU-LU session context of the packet (Step 
1040). If the values of the LFSlDs do not match (Step 1045), 
a default priority is assigned in Step 1046; otherwise, the 
switching node assigns to the inbound packet the TP level 
specified by the contents of the stored priority identifier in 
Step 1050. That is, the DLSw layer 664 maps the packet to 
a selected TCP session 732-738 (FIG. 7) based on the TP 
level specified by the priority identifier and loads the packet 
onto a corresponding queue 772-778 associated with the 
selected session. TCP/IP driver code within layer 666 of 
node 650 (FIG. 6) then maps the TP designation of the 
packet to a predetermined value of precedence bits and 
configures those bits as a "tag" within a type of is service 
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6. The apparatus of claim 5 wherein the DLSw protocol 
layer encapsulates the identifiers within fields of the filter 
and transfers the filter over the predefined communication 
channel. 

7. The apparatus of claim 6 wherein the filter further 
comprises a unique opcode identifying the filter, and 
wherein the opcode is encapsulated within a switch-lo- 
switch protocol header when transferring the filter over the 
in-band channel. 

8. The apparatus of claim 6 wherein the filter further 
comprises a unique opcode identifying the filter, and 
wherein the opcode and additional addressing information 
arc encapsulated within a defined header when transferring 
the filter over the out-of-band channel. 

9. A method for conveying information pertaining to 
transmission priority (TP) levels of inbound packets trans- 
mitted over a heterogeneous network from a switching node 
to a hybrid node of the network, the method comprising the 
steps of: 

establishing a predefined communication channel inter- 
connecting the hybrid and switching nodes; 

generating a packet-recognizing filter at the hybrid node; 

dynamically transmitting the filter to the switching node 
over the predefined communication channel; and 

assigning the inbound packets TP levels at the switching 
node using the filter. 

10. The method of claim 9 wherein the hybrid node 
comprises an Advanced Peer to Peer Networking (APPN) 
protocol layer and a Data Link Switching (DLSw) protocol 
layer, and wherein the filter comprises identifiers such as a 
unique opcode identifying the filter, a format identifier (FID) 
denoting the format of a specific inbound packet, a local 
form session identifier (LFSID) classifying the session con- 
text of the specific inbound packet and a priority identifier 
specifying a TP level of the packet. 

11. The method of claim 10 wherein the step of generating 
comprises the step of passing the identifiers from the APPN 
protocol layer to the DLSw protocol layer of the hybrid node 
through an application programming interface (API). 

12. The method of claim 11 wherein the step of generating 
further comprises the step of encapsulating the identifiers 
within fields of the filter at the DLSw layer in response to the 
API. 

13. The method of claim 12 wherein the predefined 
communication channel is an in-band channel over one of a 
plurality of existing transport session connections between 
the nodes or an out-band channel over a newly -created 
transport session between the nodes. 

14. The method of claim 13 wherein the step of generating 
further comprises one of the steps of: 

when transferring the filter over the one in-band channel, 
encapsulating the opcode within a switch-to-switch 
protocol header; and 

when transferring the filter over the out-band channel, 
encapsulating the opcode and additional addressing 
information within fields of a defined header. 

15. The method of claim 14 further comprising, after the 
step of dynamically transferring, the steps of: 

receiving the filter at a DLSw protocol layer of the 
switching node; 
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parsing the fields of the filter; 

storing the LFSD, FID and priority identifier in a tempo- 
rary storage location of the switching node; and 

examining each inbound packet prior to forwarding the 
inbound packet to the hybrid node. 

16. The method of claim 15 wherein the step of examining 
further comprises the steps of: 

initially determining the format of the inbound packet; 
10 if the format of the inbound packet is equal to the format 
specified by the stored FID, comparing the LFSID of 
the inbound packet with the stored LFSID; and 
if the LFSIDs match, assigning the inbound packet the TP 
15 level specified by the stored priority identifier. 

17. The method of claim 16 further comprising the step of, 
after the step of assigning, forwarding the inbound packet to 
the hybrid node over an appropriate one of the existing 
transport session connections. 

20 18. A computer readable medium containing executable 
program instructions for conveying information pertaining 
to transmission priority (TP) levels of inbound packets 
transmitted over a heterogeneous network from a switching 
node to a hybrid node of the network, the executable 
25 program instructions comprising program instructions for: 
generating a packet-recognizing filter at the hybrid node; 
dynamically transmitting the filter to the switching node 
over a predefined communication channel intercon- 
^ necting the hybrid and switching nodes, wherein the 
predefined communication channel is an in-band chan- 
nel over one of an existing transport session connection 
between the nodes and an out-band channel over a 
newly-created transport session between the nodes; and 
35 assigning the inbound packets TP levels at the switching 
node using the niter. 
19. The medium of claim 18 wherein the program instruc- 
tions for generating comprises program instructions for 
encapsulating identifiers within fields of the filter. 
40 20. The medium of claim 19 wherein the program instruc- 
tions for generating further comprises program instructions 
for: 

encapsulating the opcode within a first header when 
transferring the filter over the in-band channel; and 
45 encapsulating the opcode and additional addressing infor- 
mation within fields of a second header when transfer- 
ring the filter over the out -band channel. 
21. Apparatus for conveying information pertaining to 
transmission priority (TP) levels of inbound packets trans- 
50 milted over a heterogeneous network from a switching node, 
the apparatus comprising: 

a hybrid node for being coupled to a predefined commu- 
nication channel for interconnecting the hybrid node 
and the switching node, the hybrid node being config- 
55 ured to generate a packet-recognizing filter for being 
transmitted to the switching node over the predefined 
communication channel, the filter enabling the switch- 
ing node to classify the inbound packets and assign 
^ them appropriate TP levels. 

* * * * * 
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ABSTRACT 



An improved partial packet filter (10) for filtering data 
packets (210) in a computer network (12) wherein a candi- 
date field (413) of the data packet (210) is hashed to a 
plurality of bit- wise subsets (636) each being an independent 
representation of the candidate field (413). Each of the 
bit-wise subsets (636) is compared to a reference hash table 
(644) which has been prepared in a preliminary operation 
series (514). The preliminary operation series (512) config- 
ures a plurality of target fields (714) to set selected memory 
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PACKET FILTERING FOR DATA 
NETWORKS 

TECHNICAL FIELD 

The present invention relates generally to the field of 
computer science and more particularly to data networking 
and component devices attached to data networks. 

BACKGROUND ART 



JO 



Computer networks are becoming increasingly common 
in industry, education and the public sector. The media over 
which data are carried generally carry data in units referred 
to as "packets" which are destined for many different 
sources. Addressing and packet typing are included in most 15 
standardized and proprietary packet based networking pro- 
tocols which make use of destination address fields at the 
beginning of and/or within each data packet for the purpose 
of distinguishing proper recipients) of the data of the 
packets. As a packet is received at intermediate and end 20 
components in a system, rapid determination of the proper 
recipient (s) for the data must be made in order to efficiently 
accept, forward, or discard the data packet. Such determi- 
nations are made based upon the above discussed address, 
packet type and/or other fields within the relevant packets. 25 
These determinations can be made by network controller 
hardware alone, by a combination of hardware and software, 
or by software alone. In broadcast type networks, every node 
is responsible for examining every packet and accepting 
those "of interest", while rejecting all others. This is called 30 
"packet filtering". Accuracy, speed and economy of the 
filtering mechanism are all of importance. 

When the above discussed deterrmnations are made 
through a cornbir;aiicri of-hard*vare ssd ssftvvsr?, the haix!- . where: 



controller is "conditioned" with an appropriate subset of the 
specified filtering criteria, according to the filtering capa- 
bilities of that controller. The controller classifies packets 
into three categories: Those not satisfying the filter criteria 
("rejects"); those satisfying the criteria ("exact matches"); 
and those possibly satisfying the criteria ("partial matches"). 
Rejects are not delivered to the processor. Those packets 
which are classified as exact or as possible matches are 
delivered, with appropriate indications of their classifica- 
tion, to the device processor. The controller, ideally, 
excludes as many unwanted packets as its capabilities will 
allow, and the host processor (with the appropriate software 
operating therein) completes the overall filtering operation, 
as required. The value of filtering packets at the controller 
level (the partial filtering) is that it reduces the burden on the 
host processor. 

Controller filtering implementations are constrained by 
the fact that they must process packets in real-time with 
packet reception. This places a high value on filtering 
mechanisms that can be implemented with a minimum 
amount of logic and memory. Controller based filtering 
criteria are contained in a target memory. In the case of exact 
matching, a literal list of desired targets is stored in the target 
memory. While exact matching provides essentially perfect 
filtering, it can be used in applications wherein there are only 
a very small number of targets. 

Partial filtering is employed when the potential number of 
targets is relatively large, such as is often the case in 
multicast applications. A primary consideration is the "effi- 
ciency" of the partial filter. Efficiency (E), in this context, 
may be expressed as: 



E=T»/Po 



ware is said to have accomplished a "partial filtering" of the 
incoming packet stream. It should be noted that one type of 
packet filtering is accomplished on the basts of packet error 
characteristics such as collision fragments known as "runts", 
frame check sequence errors, and the like. The type of w 
filtering relevant to the present discussion is based upon 
packet filtering in which filtering criteria can be expressed as 
simple Boolean functions of data fields within the packet as 
opposed to filtering based upon detection of errors or 
improperly formed packets. 4J 

In the simplest case, each node of a computer network 
must capture those packets whose destination address field 
matches the node's unique address. However there fre- 
quently occur situations in which additional packets are also 
of interest. One example occurs when the node belongs to a 50 
predefined set of nodes all of which simultaneously receive 
certain specific "groupcast" packets which are addressed to 
that group. Groupcast packets are usually identified by some 
variation of the address field of the packet. Groupcast 
address types generally fall into one of two forms. "Broad- 55 
cast" addresses are intended for all nodes and "multicast" 
addresses are targeted for specific applications to which 
subsets of nodes are registered. Another case of such field- 
based packet filtering occurs when certain network manage- 
ment nodes are adapted to focus on specific protocols, M 
inter-node transactions, or the like, to the exclusion of all 
other traffic. 

Attachment of a networked device to the network is 
realized through a "controller** which operates indepen- 
dently of the host processor. Packet filtering then occurs in 65 
two successive stages beginning at the controller, which 
examines packets in real-time. To accomplish this, the 



Tn=the number of target packets of interest; and 
Pn=the number of potential candidates delivered to the 
processor. 

An efficiency of E=1.0 represents an exact filtering effi- 
ciency wherein every candidate is a desired target. This is 
the efficiency of the filtering which occurs in the "exact 
matching" previously discussed herein. While exact filtering 
efficiency is an objective, the previously mentioned con- 
straints, including that the controller must do its filtering in 
essentially real-time, will generally not allow for such 
efficiency. 

The predominant method used in the prior art for partial 
packet filtering is "hashing". The process conventionally 
begins with the extraction from each received packet of all 
fields involved in the specified filtering criteria. The com- 
posite of such relevant fields is called the "candidate field". 
Assuming an even distribution of candidate fields (a situa- 
tion that is not always literally accurate, but the assumption 
of which is useful for purposes of analysis), there will be a 
potential number of packet candidates of 2 C * where Cb is the 
number of bits in the candidate field. The hashing function 
produces a reduction in the bit size of the candidate field 
according to a "hashing function". As a part of the initiation 
of the controller, the hashing function is applied to each field 
of the target memory to assign a "target hash value" to each 
such field. The controller memory is initialized as a bit mask 
representing the set of target hash values. Then, during 
operation, a "candidate hash value" is created by applying 
the hashing function to each candidate field. The candidate 
hash value is used as a bit index into the controller memory, 
with a match indicating a possible candidate. 

As can be appreciated in light of the above discussion and 
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from a general understanding of simple bashing operations, 
ihe hashing function has the effect of partitioning the 2** 
candidate possibilities into Mb groups (called "buckets")* 
where Mb is the number of bits in the controller's target 
memory. Because candidate packets that fall into the same 5 
bucket are not distinguished, a "hit" represents any of 
2 C6 /Mb candidates. Useful hashing functions will partition 
the candidate possibilities in a roughly uniform distribution 
across the set of Mb buckets. For a single target, the 
efficiency of such a hashing method is Mm Ch If Tn desired 1 0 
targets arc represented by Bn buckets (where BiK=Tn and 
Bn<=Mb, the efficiency of such a hashing method is: 

In exact matching, target memory could hold Mb/Cb 
targets. Hashing is appropriate when the number of buckets 
(Bn) is larger than this figure. However, effective hashing 
also requires that the number of buckets be less than Mb, 
because as target memory density increases there is less 2Q 
differentiadon among candidate fields. With the target 
memory full of hash targets, Bn=Mb and the efficiency is 
Tn/2°\ 

As can be appreciated, the described prior art hashing 
method used for partial packet filtering implies a loss of ^ 
information in that a single hash value potentially represents 
a large set of candidates. Clearly, it would be desirable to 
reduce such loss of data. Correspondingly, it would desirable 
to maximize the filtering efficiency for a given Mb or (or to 
minimize the Mb for a given filter efficiency). 30 

To the inventor's knowledge, no prior art method for 
partial packet filtering has improved efficiency or reduced 
data loss as compared to the conventional hashing method 
described above. 

DISCLOSURE OF INVENTION " 

Accordingly, it is an object of the present invention to 
provide a method and means for efficiently performing a 
partial filtering operation on data packets in a computer 
network. 40 

It is another object of the present invention to provide a 
method and means for partial packet filtering which rejects 
a maximum number of incoming packets which are not at 
interest without requiring a large target memory and without 
unduly slowing down the processing of incoming packets. 45 

It is still another object of the present invention to provide 
a partial packet filtering method and means which is inex- 
pensive to implement. 

It is yet another object of the present invention to provide J0 
a partial packet filtering method and means which will 
operate in real-time or near real-time. 

It is still another object of the present invention to provide 
a partial packet filtering method and means which is adapt- 
able to a variety of network system requirements. $$ 

Briefly, the preferred embodiment of the present invention 
implements multiple independent hashing functions applied 
in parallel to the candidate field of each packet. The com- 
bined application of muldple independent hashing functions 
results in specification of a hash matrix, with each coordi- 60 
nate of the hash matrix being the result of one of the hashing 
functions. The hash matrix includes the results of different 
hashing algorithms applied to a single candidate field, or the 
same hashing function applied to different subsets of the 
candidate field, or a combination thereof. The filter param- 65 
eters consist of the set of acceptable result values for each 
hashing operation. 



An advantage of the present invention is that partial 
packet filtering efficiency is improved, thereby freeing the 
host processor from a substantial portion of the packet 
filtering operation. 

Yet another advantage of the present invention is that 
filtering efficiency is increased geometrically with an 
increase in target memory. 

Still another advantage of the present invention is that a 
minimum amount of target memory is required for a specific 
target efficiency. 

Yet another advantage of the present invention is that the 
partial packet filtering can be performed in a itiinimum 
amount of time for a given target efficiency. 

These and other objects and advantages of the present 
invention will become clear to those skilled in the art in view 
of the description of the best presently known modes of 
carrying out the invention and the industrial applicability of 
the preferred embodiments as described herein and as illus- 
trated in the several figures of the drawing. 

BRIEF DESCRIPTION OF THE DRAWING 

FIG. 1 is a block diagram depicting a portion of a 
computer network with an improved partial packet filter 
according to the present invention in place therein; 

FIG. 2 is a diagrammatic representation of a conventional 
prior art Ethernet data packet; 

FIG. 3 is diagrammatic representation of a hash table; 

FIG. 4 is a flow chart showing a conventional prior art 
partial packet filtering operation; 

FIG. 5 is a block depiction of a partial packet filtering 
method according to the present invention; 

FIG. 6 is a Sew chart, «i!ni!«r-!0-the chart of FIG. 4, 
depicting the packet processing operation series of FIG. 5; 
and 

FIG. 7 is a flow chart depicting the preliminary operation 
series of FIG. 5. 

BEST MODE FOR CARRYING OUT 
INVENTION 

The best presently known mode for carrying out the 
invention is a partial packet filter for implementation in a 
personal computer resident Ethernet controller. The pre- 
dominant expected usage of the inventive inrproved packet 
filter is in the interconnection of computer devices, particu- 
larly in network environments where there are relatively few 
targets. 

The improved partial packet filter of the presently pre- 
ferred embodiment of the present invention is illustrated in 
a block diagram in FIG. 1 and is designated therein by the 
reference character 10. In the diagram of FIG. 1, the 
improved partial packet filler 10 is shown configured as part 
of a network system 12 (only a portion of which is shown in 
the view of FIG. 1). In many respects, the best presently 
known embodiment 10 of the present invention is structur- 
ally not unlike conventional partial packet filler mecha- 
nisms. Like prior art conventional partial packet filters, the 
best presently known embodiment 10 of the present inven- 
tion has a controller 14 with an associated target memory 16. 
In the example of FIG. 1, the improved partial packet filter 
10 receives data from a network node 18 and performs the 
inventive improved packet filtering process on such data 
before passing selected portions of the data on to a host 
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processor IS to which the improved partial packet filter 10 
is dedicated. 

FIG. 2 is a diagrammatic representation of a conventional 
Ethernet data packet 210. The standardized Ethernet packet 
210 has a preamble 212 which is 64 bits in length, a 5 
destination address 214 which is 48 bits in length, a source 
address 216 which is 48 bits in length, a length/type field 218 
which is 16 bits in length and a data field 220 which is 
variable in length from a minimum of 46 eight bit bytes to 
a maximum of 1500 bytes. Following the data field 220 in 10 
the packet 210 is a 4 byte (32 bil) frame sequence check 
("FCS") 222. The packet 210 is transmitted serially begin- 
ning at a "head" 224 and ending at a "tail" 226 thereof. The 
preamble 212, destination address 214, source address 216 
and length/type field 218 are collectively referred to as the *5 
header 219. 

FIG. 3 is a diagrammatic representation of a conventional 
single dimensional hash table 310 with which one skilled in 
the art will be familiar. The hash table 310 has a plurality of 
address locations 312 each of which can be "set" (set to 1) 
or left unset (set to zero). 

FIG. 4 is a flow diagram depicting the operation of a 
conventional prior art partial packet filtering operation 410. 
As previously discussed briefly, a packet 210 (FIG. 2) is 
received (receive packet operation 412) from the network 18 
(FIG. 1) and a candidate field 413 (such as the header 219 
of the packet 210) is extracted (extract candidate field 
operation 414). A hashing operation 416 is performed on the 
extracted candidate field 413 to produce a hash value 417 
and the hash value 417 is compared to the hash table 310 
(FIG. 3) stored in the target memory 16 (FIG. 1) in a 
comparison operation 418. If the result of the comparison 
operation 418 is a match, the packet 210 is forwarded in a 
forward packet GpMuuGii 420. If the result ^f-thr comparison 
operation 418 is not a match, the packet 210 is rejected 422 
in a reject packet operation. It should be remembered that 
the use of the header 219 here is an example only, and any 
portion or combined portions of the packet 210 might 
constitute the candidate field 413 in a given application. 

FIG. 5 is a flow diagram depicting the inventive improved 
packet filtering process 510. The improved packet filtering 
process 510 is accomplished in a preliminary operation 
series 512 and a packet processing operation 514, each of 
which is repeated as required, as will be discussed herein- 43 
after. The preliminary operation series 512 is accomplished 
according to software residing in the host processor 20 (FIG. 
1) to configure the target memory 16 (FIG. 1) as will be 
discussed hereinafter. It should be noted that the fact that the 
improved packet filtering process 510 is divided into the two 50 
main operation categories (the preliminary operation series 
512 and the packet processing operation 514) does not 
distinguish this invention over the prior art Rather, the 
processes within the preliminary operation series 512 and 
the packet processing operation 514 describe the essence of 55 
the inventive process. 

FIG. 6 is a flow chart showing the inventive packet 
processing operation 514 in a manner analogous to the 
presentation of the prior art partial packet filtering operation 
410 depicted in FIG. 4. As can be seen in the view of FIG. 60 
6. the packet processing operation series 514 is similar in 
many respects to the prior art partial packet filtering process 
410 (FIG. 4). In the packet processing operation series 514, 
a packet 210 (FIG. 2) is received (receive packet operation 
412) and a candidate field 413 is extracted in an extract 63 
candidate field operation 414. In the best presently known 
embodiment 10 of the present invention, the inventive 



packet processing operation series 514 next performs a 
candidate field reduction operation 626. In the best presently 
known embodiment 10 of the present invention, the candi- 
date field reduction operation 626 is merely the application 
of the conventional CRC polynomial algorithm to the can- 
didate field 413 to yield a 32 bit CRC output value 628 
(although any of a number of similar algorithms might be 
applied for this purpose). Next, a subset selection operation 
630 selects a predetermined number (two in the example of 
FIG. 6) of bit-wise subsets 636 from the CRC output value 
628. The method for determining the quantity of bit- wise 
subsets 636 to be selected in the subset selection operation 
630, and the size of each, will be discussed hereinafter. In the 
best presently known embodiment 10 of the present inven- 
tion, the bit-wise subsets 636 are each 6 bits in length. It 
should be noted that, in the best presently known embodi- 
ment 10 of the present invention, the bit- wise subsets 636 
are selected from the CRC output value 628 simply by 
taking the first 6 bits of the CRC output value 628, the 
second six bits, and so on until as many bit- wise subsets as 
arc needed are obtained and so, in the best presently known 
embodiment 10 of the present invention, the bit wise subset 
636 are "consecutive bit section " of the fixed size field (the 
CRC output value 628 in the best presently known embodi- 
ment 10 of the present invention. The inventors have deter- 
mined that the bits of the CRC output value 628 (resulting 
from the CRC polynomial function) are independent of each 
other, and so any 6 bit portion of the CRC output value 628 
is as representative of the CRC output value 628 as is any 
other 6 bit portion. 

The bit-wise subsets 636 are then compared to the hash 
table 310 (FIG. 3) stored in the target memory 16 (FIG. 1) 
in a comparison operation 642. The combined multiple hash 
values 636 may be considered to be a hash matrix 638 (in the 
example nf FIG, 6. a two dimensional hash matrix 638). 

It is important to note that the essence of the present 
inventive method lies in the extraction of the plurality of 
independent or relatively independent representative indices 
of the candidate field 413 ("candidate filled indices") which, 
in the example of the best presently known embodiment 10 
of the present invention are the bit-wise subsets 636 which 
make up the hash matrix 638. That is, the bit-wise subsets 
636 are representative fields in that the bit-wise subsets 636 
are representative of the candidate field 413, as discussed 
above. The generally simultaneous (parallel) processing of 
these is the source of the advantages of the present inventive 
method and means. The exact method described herein in 
relation to the best presently known embodiment 10 of the 
present invention, that of first reducing the candidate field 
413 in the candidate field reduction operation 626 and then 
extracting the bit-wise subsets 636 is but one of many 
potential methods for accomplishing such a parallel hashing 
operation 639, and the present invention is not intended to 
be limited by this aspect of the best presently known 
embodiment 10. 

In the best presently known embodiment 10 of the present 
invention, in a comparison operation 642, each of the 
bit- wise subsets 636 is compared to a reference hash table 
644 (a "target hash array") stored in the target memory 16 
(FIG. 1) and only if all match is the packet 210 forwarded 
in a packet forwarding operation 646. In the example of FIG. 
6, the reference hash table 644 will be a 64 element array 
representing all values from 0 through 63 inclusive. Some 
elements of the reference hash table 644 are set as will be 
discussed hereinafter in relation to the preliminary operation 
series 512. If the value of the bit-wise subset "falls into one 
of the buckets" (is equivalent to a corresponding set bit in 
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the reference hash table 644), then the data packet 210 is 
defined as being a "match". 

Now returning to a consideration of the preliminary 
operation series 512 (FIG. 5) with an understanding of the 
packet processing operation series 514, the target memory 
16 is configured in process steps much like those described 
in relation to the packet processing operation series 514 of 
FIG. 6. 

FIG. 7 is a flow diagram of the orcliminary operation 
series 512 according to the best presently known embodi- 
ment 10 of the present invention. A preliminary operation 
which is common to both the prior art and the present 
invention is a target field(s) selection process 712. The target 
(field) s selection process is merely the selection of criteria 



10 



in the present example). It should be noted that a target 
parallel hashing operation 739 is like the previously 
described parallel hashing operation 639 in that the inven- 
tion might be practiced with variations of the specific steps 
therein which are presented here as features of the best 
presently known embodiment 10 of the present invention. 

In a target memory setting operation 740 the reference 
hash table 644 is formatted such that each memory location 
312 corresponding to a value of any of the target bit- wise 
subsets 736 is set. For example, if the first target bit- wise 
subset 736a were "000010" (decimal value 2) then the third 
memory location 312c in the reference hash tabic 644 would 
be set to 'V, as is illustrated in FIG. 7. As can be appreciated 
from the above discussion, the maximum number of 



example, if the entire process is «>. be ■» the basis of desired can be set by this process is the quantity of target bit-wise 
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destinations, then an intended destination address 214 (FIG. 
2) will be (one of) the target field(s) 714, and if three 
destinations are of interest, then there will be three target 
fields 714 as illustrated in the example of FIG. 7. The actual 
process involved in selecting the target field(s) is a function 
of network control software which is found in the prior art 
and which is not relevant to the present invention excepi to 
the extent that it delivers the target field(s) 714 to the 
inventive preliminary operation series 512. 

Having determined the quantity of target fields 714 of 
interest, host software will next determine a bit- wise subset 
quantity 716 (the appropriate "subset quantity" of bit-wise 
subset 636) in a bit-wise subset quantity determination 
operation 718. The bit- wise subset quantity determination 
operation 718 will be discussed in more detail hereinafter, as 
it can be better understood in light of the present description 
of the entire preliminary operation series 512. For the 
present simplified example of FIGS. 6 and 7, and as already 
mentidhedruie bit-wise sub»l quarmiy 716 is tv;c.-That is, -3* 
two of the bit- wise subsets 636 are to be extracted from the 
CRC output value 628 in the subset selection operation 630 
of FIG. 6. 

As can be appreciated, the target fields 714 are each 
equivalent in form to the candidate fields 413 discussed 40 
previously herein, and processing of the target fields 714 is 
much the same as has been previously described herein in 
relation to the candidate fields 413. In the inventive prelimi- 
nary operation series 512, each of the target fields 714 is 
processed in a target field reduction operation 726 by 
application of the CRC polynomial to produce a target CRC 
value 728. Each of the target CRC values 728 is then 
processed in a target subset selection operation 730 to 
produce a plurality (two for each target CRC value 728 for 
a total of six, in the present example) of target bit-wise 
subsets 736. In more general terms, each of the "target fields 
714 (having been selected according to prior art methods as 
discussed previously, herein) is processed as described to 
produce a "target representative field" (the target CRC value 
728 in the present example), which is then further processed 55 
as described to produce the "target indices", which target 
indices may be "target string subsets38 of the target repre- 
sentative field and which are, in the present example, the 
target bit-wise subsets 736. This process is alike to the 
process which is repeated as necessary to process each 
incoming data packet 210, wherein the candidate fields 413 
are processed to produce a candidate representative field (the 
CRC output value 628 in the present example), which is 
further processed to produce the "candidate string subsets" 
(the bit-wise subsets 636 in the present example). The 
quantity of target bit- wise subsets 736 taken from each target 
CRC value 728 is also the bit-wise subset quantity 716 (two, 



subsets 736 (six, in the present example). However, since 
two or more of the target bit- wise subsets might coinciden- 
tally hash to the same value, a lesser quantity of memory 
locations 312 might also be set. 

Now returning to a more detailed discussion of the 
bit-wise subset quantity determination operation 718, the 
target memory 16 is to be configured to maximize the 
effectiveness of the filtering based on the quantity of mul- 
ticast packets 210 of interest to the software of the host 
processor Therefore, the bit-wise subset quantity determi- 
nation operation 718 attempts to determine (or. at least, to 
approximate) an optimal number of indices per packet (and, 
thus, the bit-wise subset quantity 716 discussed previously 
herein). The "optimal" number here means that which will 
minimize the number of "uninteresting" packets which 
match the set data bits 312 in the reference hash table 644 
while matching all of the "interesting" packets 210. In the 
best presently known embodiment 10 of the present inven- 
tion the following table is used to determine the bit-wise 
subset quantity 716. 



45 



50 



TABLE OF SUBSET QUANTITIES 


Addrtsies of 


Number of Huh Ladicci 


Intereu 


Bit- Wise Subset Quantity 716 


1-2 


5 


3 


A 


4-9 


3 


10-16 


2 


17 or more 


1 



60 



65 



The above table is offered here as a guide only, in that the 
"optima]" number of selected hash indices may vary in ways 
not presently contemplated. Furthermore, it should be noted 
that the above table is based upon an assumption that none 
of the target indices (the target bitwise subsets 736 in the 
best presenUy known embodiment 10 of the present inven- 
tion hash to the same memory locations 312 in the reference 
hash table 644. If, indeed, two or more of the target bit-wise 
subsets 736 did hash to the same memory location 312, then 
additional hash indices could be added to increase efficiency 
without sacrificing speed or requiring additional memory or 
processing. 

It should be noted that while the packet processing 
operation series 514 is accomplished in the hardware of the 
best presently known embodiment 10 of the present inven- 
tion, the preliminary operation series (which can be accom- 
plished at a more leisurely pace) is performed primarily by 
software of the host processor 20. As can be appreciated in 
light of the above discussion, the preliminary operation 
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series will be repeated when the network 12 is reconfigured, 
when it is desired to communicate with additional members 
of the network 12, or upon other occasions according to the 
needs of the user and the network 12. The packet processing 
operation series 514 will be repeated whenever an incoming s 
packet is detected from the network node 18. 

It should also be noted that, while the best presently 
known embodiment 10 of the present invention hashes each 
of the CRC values 628 and 728 to a common reference hash 
table 644, the invention might be practiced with equal 10 
efficiency by hashing each of the CRC values 628 and 728 
to its own individual hash table (not shown). Using the 
quantities of the example of FIGS. 6 and 7, each of the 
individual hash tables would be 32 bits {memory locations 
312) large (one half of 64 bits, since it must be divided 15 
between the two target CRC values 728). The individual 
bit-wise subsets 636 and 736 would then be 5 bits long 
(decimal value 0 through 31). 

Various modifications may be made to the inventive 
improved packet filter 10 without altering its value or scope. 20 
For example, the quantity, size, and derivation of the plu- 
rality of bit- wise subsets 636 and 738 could readily be 
revised according to the parameters discussed herein. 

All of the above are only some of the examples of ^ 
available embodiments of the present invention. Those 
skilled in the art will readily observe that numerous other 
modifications and alterations may be made without depart- 
ing from the spirit and scope of the invention. Accordingly, 
the above disclosure is not intended as limiting and the 3Q 
appended claims are to be interpreted as encompassing the 
entire scope of the invention. 



INDUSTRIAL APPLICABILITY 



The improved partial packet filter 10 is adapted to be 
widely used in computer network communications. The 
predominant current usages are for the interconnection of 
computers and computer peripheral devices within networks 
and for the interconnection of several computer networks. 40 

The improved partial packet filters 10 of the present 
invention may be utilized in any application wherein con- 
ventional computer interconnection devices are used. A 
significant area of improvement is in the inclusion of the 
parallel processing of a plurality of indices (bit- wise subsets 43 
636) of a packet 

The efficiency of the filtering provided by the improved 
partial packet filter 10 is significantly improved, particularly 
for cases where the number of targets is small relative to the 
number of "buckets" (memory locations 312). To compare 50 
the efficiency of the present inventive improved packet 
filtering process 510 embodied in the improved partial 
packet filter 10 with the prior art partial packet filtering 
process 410, assume, for example, the following values: 

Mb=64 (representing 64 memory locations 312 in the 
reference hash table 644) 

Cb=48 (representing a 48 bit candidate field 413 size— a 
typical size of the destination address 214 

Dn=4 (representing a bit- wise subset quantity 716 of four) 60 

Then, the prior art partial packet filtering process 410 will 
partition the 2° possibilities among 64 distinct buckets, one 
of which matches the bucket into which the single target 
falls. In the improved packet filtering process 510, the four 
parallel hashing functions partition among 16 possible buck- 65 
els each. The efficiency (Ef) for the prior art partial packet 
filtering process 410 would then be: 



The efficiency (Ef4) for this example of the improved 
packet filtering process 510 is: 

The efficiency Ef4 is better than the efficiency Ef by a 
factor of 2 10 (1024), which is to say thai only a thousandth 
as many (uninteresting) packets will be delivered to the next 
stage of filtering using the inventive improved partial packet 
filter 10 as compared to the prior art. 

Filtering of packets may be accomplished through a 
combination of exact and partial match filters. Typically, one 
ox more partial filterings will occur first, with the multiple 
dimensions of each filtering accomplished in parallel with 
each other (according to the present invention). Packets 
which pass through the inventive improved partial packet 
filter 10 may then be filtered using an exact match filter 
technique, such as "binary search lookup" of the filter data 
in a sorted table of acceptable filter data values. Further- 
more, results of partial filtering can be used to determine 
which of many (possibly sorted) tables in which to search for 
the packet 

Accordingly, the inventive improved packet filtering pro- 
cess 510 may be applied more than once to each incoming 
packet 210 (in a first stage and a second stage). In such an 
example, configuration of the first stage partial filtering 
would involve specification of the number and type of 
hashing operations to be performed, along with the portion 
of the packet which is to comprise the filter data for each 
such operation, along with acceptable results for each. 
Multiple partial filterings may be configured with the speci- 
fication including the logical relation to apply to the results 
of each filtering. For example, partial filtering A might be to 
35 apply the 32 bit CRG pcIyncmkJ -to -the destination address 



55 



field of an Ethernet packet, and retain the lowest order 3 
bits — a value from 0 to 7. Partial filtering B might be to 
apply the 32 bit CRC polynomial to the source address field 
of the Ethernet packet, and retain the lowest order 3 bits. The 
logical relation might be to accept packets only for which the 
results of the first filtering (A) is either 2 or 4, and the result 
of the second filtering (B) is either a 3 or a 4. In a general 
case, one may expect the likelihood of arbitrarily filter data 
to "pass" the first filtering to be 2 in 8 (25%), since 2 of the 
8 values from 0 to 7 are acceptable. Similarly, the likelihood 
of the second filtering "passing" such a filter is 2 in 8 (25%). 
Assuming that the two filterings are, as desired, truly inde- 
pendent, the likelihood of this arbitrary packet being 
accepted is the product of these, or 1 in 16. Note further that 
the specification of these "acceptable result sets" ({2.4} for 
A and {3,4} for B) requires 16 bits of information for full 
specification, where 8 bits indicate the acceptability/unac- 
ceptability of each of the 8 possible values of filtering A, and 
8 additional bits indicate the acceptabilily/unacceptability of 
each of the 8 possible values of filtering B. Use of such 
multiple partial filterings may be especially effective in 
situations where filtering criteria are derived from indepen- 
dent portions of the filter data, such as filtering for all 
packets whose destination address OR whose source address 
is within a set of interesting addresses AND whose packet 
type indicates a particular protocol of interest. 

Since the improved partial packet filters of the present 
invention may be readily constructed and are compatible 
with existing computer equipment it is expected that they 
will be acceptable in the industry as substitutes for conven- 
tional means and methods presently employed for partial 
packet filtering. For these and other reasons, it is expected 
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that the utility and industrial applicability of the invention 
will be both significant in scope and long-lasting in duration. 
We claim: 

1. A method for selectively forwarding a data packet and 
controlling the distribution of data packets in a computer 5 
neiwork system, the data packet having a candidate field 
containing information about the data packet, the method 
comprising: 

configuring a target memory of a controller to contain a 
target hash array in steps including; 10 
aa determining a target field and extracting a plurality 

of target indices from said target field, the target 

indices being a binary number having a value; 
ab setting memory locations in the target memory 

corresponding to the value of each of the target 15 

indices; and 
processing the data packet in steps including: 
ba extracting the candidate Meld from the data packet; 
bb extracting from the candidate field a plurality of ^ 

candidate field indices; 
be comparing the values of each of the candidate field 

indices to the target hash array; and 
bd forwarding the packet when each of the values of 

each of the candidate field indices corresponds to a 

memory location of the target hash array which was 

set in step ab. 

2. The method of claim 1, wherein: 

step aa is accomplished in substeps including: 

aal reducing the target fields to a plurality of target }Q 

representative fields; and 
aa2 selecting one or more target string subsets from the 

target representative field; and 
step bb is accomplished in substeps including: 
bbl reducing me^ar^idale field to a plurality of 33 

candidate representative fields; and 
bb2 selecting one or more candidate string subsets from 

the target representative field. 

3. The method of claim 1, wherein; 

step ab is accomplished by causing only those memory « 
locations in the target memory which correspond to the 
value of each of the target string subsets to contain a 
value of one. 

4. The method of claim 2, wherein: 

step aal is accomplished by applying a cyclic redundancy 45 
check algorithm to each of the target fields; and 

step bbl is accomplished by applying the same cyclic 
redundancy check algorithm to the candidate field. 

5. The method of claim 2, wherein: ^ 
in step aa2 the target string subsets are selected by 

extracting a plurality of target bit- wise subsets from the 
target representative field; and 
in step bb2 the candidate siring subsets are selected by 
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extracting a plurality of candidate bit- wise subsets from 
the representative candidate field. 

6. The method of claim 2, and further including: 

an additional process step preceding step ab wherein a 
subset quantity is determined, the subset quantity being 
the number of target string subsets to be extracted from 
each of the target representative fields and also the 
number of candidate string subsets to be extracted from 
each of the candidate representative fields. 

7. The method of claim 6, wherein: 

the additional process step is accomplished, at Least ini- 
tially, by selecting the subset quantity from a table of 
subset quantities. 

8. The method of claim 2, wherein: 

each of the target representative target and the candidate 
representative field are 32 bits in length. 

9. The method of claim 1, wherein: 

steps aa through ab arc repeated when a change in the 
distribution of data packets is desired. 

10. The method of claim 1, wherein: 

steps ba through bd are repeated for each incoming data 
packet. 

U. The method of claim 1, and further including: 
an additional process step preceding step ab wherein a 
subset quantity is determined, the subset quantity being 
the number of target indices to be extracted from each 
of the target fields and also the number of candidate 
indices to be extracted from each of the candidate 
fields. 

12. The method of claim 11, wherein: 

the additional process step is accomplished, at least ini- 
tially, by selecting the subset quantity from a table of ■ 
subset quantities appropriate to a quantity of target 
quantities. 

13. The method of claim 1, wherein: 

the candidate field includes a target address field of the 
data packet. 

14. The method of claim 1, wherein: 

the data packet is a standardized Ethernet data packet. 

15. The method of claim 1, wherein: 

the target hash array is an unapportioned array such that 
each of the target indices is used to set memory 
locations in that unapportioned array. 

16. The method of claim 1, wherein: 

the target hash array is apportioned such thai at least some 
of the target indices are directed to different portions or 
the target hash array. 

17. The method of claim 1, wherein: 

the target indices and the candidate indices are each a 
binary string of fixed bit length. 

* * « * * 
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